Title: | Uganda Time Series Database API |
---|---|
Description: | An R API providing easy access to a relational database with macroeconomic, financial and development related time series data for Uganda. Overall more than 5000 series at varying frequency (daily, monthly, quarterly, annual in fiscal or calendar years) can be accessed through the API. The data is provided by the Bank of Uganda, the Ugandan Ministry of Finance, Planning and Economic Development, the IMF and the World Bank. The database is being updated once a month. |
Authors: | Sebastian Krantz [aut, cre] |
Maintainer: | Sebastian Krantz <[email protected]> |
License: | GPL-3 |
Version: | 0.2.3 |
Built: | 2024-11-17 03:09:29 UTC |
Source: | https://github.com/cran/ugatsdb |
An R API providing easy access to a relational database with macroeconomic, financial and development related time series data for Uganda. Overall more than 5000 series at varying frequency (daily, monthly, quarterly, annual in fiscal or calendar years) can be accessed through the API. The data is provided by the Bank of Uganda, the Ugandan Ministry of Finance, Planning and Economic Development, the IMF and the World Bank. The database is being updated once a month.
Functions to retrieve tables identifying the data
datasources()
datasets()
series()
Function to retrieve the data from the database
Functions to reshape data and add temporal identifiers
long2wide()
wide2long()
expand_date()
Function to export wide format data to Excel
Helper functions (useful esp. for common Excel formats)
Global Macros with core ID variables in the database
Function to renew database connection without reloading the package
library(ugatsdb) library(magrittr) # Pipe %>% operators library(xts) # Time series class and pretty plots # Plotting daily IFEM Buying and Selling Rates from the Bank of Uganda get_data("BOU_E", c("E_IFEM_B", "E_IFEM_S"), from = 2020) %>% as.xts %>% plot library(dygraphs) # Same generating a dynamic chart get_data("BOU_E", c("E_IFEM_B", "E_IFEM_S"), from = 2020) %>% as.xts %>% dygraph # Static plot but with legend showing variable labels get_data("BOU_E", c("E_IFEM_B", "E_IFEM_S"), from = 2020, wide = FALSE) %>% long2wide(names_from = "Label") %>% as.xts %>% plot(legend.loc = "topleft")
library(ugatsdb) library(magrittr) # Pipe %>% operators library(xts) # Time series class and pretty plots # Plotting daily IFEM Buying and Selling Rates from the Bank of Uganda get_data("BOU_E", c("E_IFEM_B", "E_IFEM_S"), from = 2020) %>% as.xts %>% plot library(dygraphs) # Same generating a dynamic chart get_data("BOU_E", c("E_IFEM_B", "E_IFEM_S"), from = 2020) %>% as.xts %>% dygraph # Static plot but with legend showing variable labels get_data("BOU_E", c("E_IFEM_B", "E_IFEM_S"), from = 2020, wide = FALSE) %>% long2wide(names_from = "Label") %>% as.xts %>% plot(legend.loc = "topleft")
The macro .IDvars
contains the string c("DSID", "Series")
denoting
variables that uniquely identify series in the database.
Note that the series code contained in "Series" alone is not sufficient to uniquely identify a series as some
series are recorded with the same code in multiple datasets (mostly either the same data aggregated at a different frequency, or a different collection of indicators).
For example goods exports with the series code "EX_G" are recorded in the datasets "BOU_MMI", "BOU_MMI_A" (annual data), and "BOU_MMI_FY" (fiscal year data).
The macro .Tvars
contains the string
c("Date", "Year", "Quarter", "FY", "QFY", "Month", "Day")
denoting temporal identifiers generated by expand_date
.
The "Date" variable is sufficient to uniquely identify a point in time in the database.
.IDvars .Tvars
.IDvars .Tvars
.IDvars .Tvars
.IDvars .Tvars
This function pulls and return a table called 'DATASET' from the database.
datasets(ordered = TRUE)
datasets(ordered = TRUE)
ordered |
logical. |
The 'DATASET' table gives information about the different datasets read into the database from various sources. It provides a unique id for each dataset, the frequency of data, the minimum and maximum time coverage, when the dataset was last updated, a description, the source (matching the 'Source' column in the 'DATASOURCE' table), and an (optional) url providing direct access to the raw data.
A data.table
with information about the available datasets in the database.
datasets()
datasets()
This function pulls and returns a table called 'DATASOURCE' from the database.
datasources(ordered = TRUE)
datasources(ordered = TRUE)
ordered |
logical. |
The 'DATASOURCE' table gives information about the various sources / providers of data in this database, including the source website, the number of datasets available from the source, a description of the source and the way data is accessed from the source.
A data.table
with information about the sources of data in the database.
datasources()
datasources()
This function expands a date column and generates additional temporal identifiers from it (such as the year, month, quarter, fiscal year etc.).
expand_date( x, gen = c("Year", "Quarter", "FY", "QFY", "Month"), origin = "1899-12-30", keep.date = TRUE, remove.missing.date = TRUE, sort = TRUE, as.factor = TRUE, ... )
expand_date( x, gen = c("Year", "Quarter", "FY", "QFY", "Month"), origin = "1899-12-30", keep.date = TRUE, remove.missing.date = TRUE, sort = TRUE, as.factor = TRUE, ... )
x |
either a vector of class 'Date', or coercible to date using |
gen |
character. A vector of identifiers to generate from |
origin |
character / Date. Passed to |
keep.date |
logical. |
remove.missing.date |
logical. |
sort |
logical. |
as.factor |
|
... |
not used. |
A data.table
containing the computed identifiers as columns. See Examples.
# First a basic example x <- seq.Date(as.Date("1999-01-01"), as.Date("2000-01-01"), by = "month") expand_date(x) expand_date(x, gen = c("Year", "Month"), keep.date = FALSE) # Now using the API expand_date(get_data("BOU_CPI")) # Getting Monthly CPI data from the Bank of Uganda # Same thing get_data("BOU_CPI", expand.date = TRUE)
# First a basic example x <- seq.Date(as.Date("1999-01-01"), as.Date("2000-01-01"), by = "month") expand_date(x) expand_date(x, gen = c("Year", "Month"), keep.date = FALSE) # Now using the API expand_date(get_data("BOU_CPI")) # Getting Monthly CPI data from the Bank of Uganda # Same thing get_data("BOU_CPI", expand.date = TRUE)
This is the main function of the package to retrieve data from the database. It constructs an SQL query which is sent to the database and returns the data as a data.table
in R.
get_data( dsid = NULL, series = NULL, from = NULL, to = NULL, labels = TRUE, wide = TRUE, expand.date = FALSE, ordered = TRUE, return.query = FALSE, ... )
get_data( dsid = NULL, series = NULL, from = NULL, to = NULL, labels = TRUE, wide = TRUE, expand.date = FALSE, ordered = TRUE, return.query = FALSE, ... )
dsid |
character. (Optional) id's of datasets matching the 'DSID' column of the 'DATASET' table (retrieved using |
series |
character. (Optional) codes of series matching the 'Series' column of the 'Series' table (retrieved using |
from |
set the start time of the data retrieved by either supplying a start date, a date-string of the form |
to |
same as |
labels |
logical. |
wide |
logical. |
expand.date |
logical. |
ordered |
logical. |
return.query |
logical. |
... |
further arguments passed to |
If labels = FALSE
, the 'SERIES' table is not joined to the 'DATA' table, and ordered = TRUE
will order datasets and series retrieved in alphabetic order.
If labels = TRUE
data is ordered by series and date within each dataset, preserving the order of columns in the dataset. If multiple datasets are received they are ordered alphabetically according to the 'DSID' column.
It is possible query multiple series from multiple datasets e.g. get_data(c("DSID1", "DSID2"), c("SERFROM1", "SERFROM2"))
etc., but care needs to be taken that the series queried do not occur in both datasets (see .IDvars
, and check using series(c("DSID1", "DSID2"))
).
Series from datasets at different frequencies can be queried, but, if wide = TRUE
, this will result in missing values for all but the first observations per period in the lower frequency series.
A data.table
with the result of the query.
long2wide
, expand_date
, ugatsdb
# Return monthly macroeconomic indicators from the year 2000 onwards get_data("BOU_MMI", from = 2000, wide = FALSE) # Return wide format with date expanded get_data("BOU_MMI", from = 2000, expand.date = TRUE) # Same thing in multiple steps (with additional customization options): library(magrittr) # Pipe %>% operators get_data("BOU_MMI", from = 2000, wide = FALSE) %>% long2wide %>% expand_date # Getting a single series get_data("BOU_MMI", "M2", 2000) # Getting High-Frequency activity indicators from BoU and Revenue & Expense from MoFPED get_data(c("BOU_MMI", "MOF_TOT", "WB_WDI"), c("CIEA", "BTI", "REV_GRA", "EXP_LEN")) # Getting daily interest rates and plotting library(xts) # Time series class get_data("BOU_I", from = 2018, wide = FALSE) %>% long2wide(names_from = "Label") %>% as.xts %>% plot(legend.loc = "topleft")
# Return monthly macroeconomic indicators from the year 2000 onwards get_data("BOU_MMI", from = 2000, wide = FALSE) # Return wide format with date expanded get_data("BOU_MMI", from = 2000, expand.date = TRUE) # Same thing in multiple steps (with additional customization options): library(magrittr) # Pipe %>% operators get_data("BOU_MMI", from = 2000, wide = FALSE) %>% long2wide %>% expand_date # Getting a single series get_data("BOU_MMI", "M2", 2000) # Getting High-Frequency activity indicators from BoU and Revenue & Expense from MoFPED get_data(c("BOU_MMI", "MOF_TOT", "WB_WDI"), c("CIEA", "BTI", "REV_GRA", "EXP_LEN")) # Getting daily interest rates and plotting library(xts) # Time series class get_data("BOU_I", from = 2018, wide = FALSE) %>% long2wide(names_from = "Label") %>% as.xts %>% plot(legend.loc = "topleft")
This function automatically reshapes long (stacked) raw data from the API (get_data(..., wide = FALSE)
) to a wide format where each variable has its own column.
long2wide( data, id_cols = intersect(.Tvars, names(data)), names_from = "Series", values_from = "Value", labels_from = if (any(names(data) == "Label")) "Label" else NULL, expand.date = FALSE, ... )
long2wide( data, id_cols = intersect(.Tvars, names(data)), names_from = "Series", values_from = "Value", labels_from = if (any(names(data) == "Label")) "Label" else NULL, expand.date = FALSE, ... )
data |
raw data from the API: A long format data frame where all values are stacked in a value column. |
id_cols |
character. Temporal identifiers of the data. By default all variables in |
names_from |
character. The column containing the series codes. These will become the names of the new columns in the wider data format. |
values_from |
character. The column containing the data values. |
labels_from |
character. The column containing the labels describing the series. |
expand.date |
logical. |
... |
further arguments passed to |
A data.table
with the reshaped data.
wide2long
, wide2excel
, ugatsdb
# Return monthly macroeconomic indicators from the year 2000 onwards long2wide(get_data("BOU_MMI", from = 2000, wide = FALSE))
# Return monthly macroeconomic indicators from the year 2000 onwards long2wide(get_data("BOU_MMI", from = 2000, wide = FALSE))
This function coerces date strings i.e. "YYYY-MM-DD"
or "YYYY-MM"
, years e.g. 2015
(numeric or character),
year-quarters e.g. "2015Q1"
or "2015-Q1"
, year-months e.g. "2015M01"
or "2015-M01"
, fiscal years e.g. "1997/98"
or numeric values representing dates (e.g. previously imported Excel date) to a regular R date.
make_date(x, end = FALSE, origin = "1899-12-30")
make_date(x, end = FALSE, origin = "1899-12-30")
x |
a character date string |
end |
logical. |
origin |
a date or date-string that can be used as reference for converting numeric values to dates. The default corresponds to dates generated in Excel for Windows. See |
A Date
vector.
make_date("2011-05") make_date(2011) make_date("2011/12") make_date("2011/12", end = TRUE) make_date("2011Q1") make_date("2011Q1", end = TRUE)
make_date("2011-05") make_date(2011) make_date("2011/12") make_date("2011/12", end = TRUE) make_date("2011Q1") make_date("2011Q1", end = TRUE)
This function pulls and returns a table called 'SERIES' from the database.
series(dsid = NULL, dataset.info = TRUE, ordered = TRUE, return.query = FALSE)
series(dsid = NULL, dataset.info = TRUE, ordered = TRUE, return.query = FALSE)
dsid |
character. (Optional) id's of datasets matching the 'DSID' column of the 'DATASET' table (retrieved using |
dataset.info |
logical. |
ordered |
logical. |
return.query |
logical. |
The 'SERIES' table gives information about all of the time series in the database. Each series is given a code which is however not unique across datasets (see .IDvars
).
Each series also has a label describing the series. Further information recorded are the minimum and maximum time coverage, and (optionally) a separate series source and url.
By default dataset.info = TRUE
and the frequency of the data, the date when the dataset containing the series was last updated, the dataset and data source are added to the
'SERIES' table from the 'DATASET' table.
If dataset.info = FALSE
, the 'DATASET' table is not joined to the 'SERIES' table, and ordered = TRUE
only orders the series within each dataset to maintain the column order of series in the source data.
In that case the datasets are returned in alphabetic order of 'DSID', not the order in which they were entered into the 'DATASET' table.
A data.table
with information about the available time series in the database.
# By default returns all series with additional information series() # Raw series table series(dataset.info = FALSE) # Only series in the Monthly Macroeconomic Indicators of the BoU series("BOU_MMI")
# By default returns all series with additional information series() # Raw series table series(dataset.info = FALSE) # Only series in the Monthly Macroeconomic Indicators of the BoU series("BOU_MMI")
This function is called by wide2excel
with option transpose = TRUE
to generate a row-based tabular data format from a wide data frame in R that is suitable for exporting to Excel.
transpose_wide(data, date.format = "%d/%m/%Y")
transpose_wide(data, date.format = "%d/%m/%Y")
data |
a wide format data frame where each column is a variable and the first variable uniquely identifies the data. |
date.format |
a format for date columns which is passed to |
A transposed data frame or data.table
(the class of the input is preserved).
transpose
, long2wide
, wide2excel
, ugatsdb
transpose_wide(get_data("BOU_CPI"))
transpose_wide(get_data("BOU_CPI"))
This function terminates an existing connection to the database server and attempts to reconnect to it.
It is now somewhat redundant by the safe query mechanism introduced in v0.2.1 of the package, where each query is evaluated inside
tryCatch
and the database connection is renewed if the query fails. This function can still be used to manually
renew the database connection.
ugatsdb_reconnect()
ugatsdb_reconnect()
ugatsdb_reconnect()
ugatsdb_reconnect()
This function exports a wide format dataset to a column- (default) or row-oriented Excel format.
wide2excel(data, ..., transpose = FALSE, transpose.date.format = "%d/%m/%Y")
wide2excel(data, ..., transpose = FALSE, transpose.date.format = "%d/%m/%Y")
data |
a wide dataset from |
... |
further arguments to |
transpose |
logical. If |
transpose.date.format |
argument passed to |
transpose_wide
, write_xlsx
, ugatsdb
## Not run: # Getting macroeconomic indicators from Bank of Uganda in fiscal years data <- get_data("BOU_MMI_FY", from = "2000/01") # Saving to different Excel formats wide2excel(data, "BOU_MMI_FY.xlsx") wide2excel(data, "BOU_MMI_FY.xlsx", transpose = TRUE) # Saving to alternative path wide2excel(data, "C:/Users/.../BOU_MMI_FY.xlsx") ## End(Not run)
## Not run: # Getting macroeconomic indicators from Bank of Uganda in fiscal years data <- get_data("BOU_MMI_FY", from = "2000/01") # Saving to different Excel formats wide2excel(data, "BOU_MMI_FY.xlsx") wide2excel(data, "BOU_MMI_FY.xlsx", transpose = TRUE) # Saving to alternative path wide2excel(data, "C:/Users/.../BOU_MMI_FY.xlsx") ## End(Not run)
This function automatically reshapes wide (column-based) data into a long format akin to the format of the raw data coming from the database (get_data(..., wide = FALSE)
).
wide2long( data, id_cols = intersect(.Tvars, names(data)), to_value = setdiff(names(data), id_cols), variable_name = "Series", value_name = "Value", label_name = "Label", na.rm = TRUE, ... )
wide2long( data, id_cols = intersect(.Tvars, names(data)), to_value = setdiff(names(data), id_cols), variable_name = "Series", value_name = "Value", label_name = "Label", na.rm = TRUE, ... )
data |
a wide format data frame where all series have their own column. |
id_cols |
character. Temporal identifiers of the data. By default all variables in |
to_value |
character. The names of all series to be stacked into the long format data frame. |
variable_name |
character. The name of the variable to store the names of the series. |
value_name |
character. The name of the variable to store the data values. |
label_name |
character. The name of the variable to store the series labels. |
na.rm |
logical. |
... |
further arguments passed to |
A data.table
with the reshaped data.
# Return monthly macroeconomic indicators from the year 2000 onwards data <- get_data("BOU_MMI", from = 2000) wide2long(data)
# Return monthly macroeconomic indicators from the year 2000 onwards data <- get_data("BOU_MMI", from = 2000) wide2long(data)