Title: | South Africa Macroeconomic Database API |
---|---|
Description: | An R API providing access to a relational database with macroeconomic time series data for South Africa, obtained from the South African Reserve Bank (SARB) and Statistics South Africa (STATSSA), and updated on a weekly basis via the EconData <https://www.econdata.co.za/> platform and automated scraping of the SARB and STATSSA websites. The database is maintained at the Department of Economics at Stellenbosch University. |
Authors: | Sebastian Krantz [aut, cre] |
Maintainer: | Sebastian Krantz <[email protected]> |
License: | GPL-3 |
Version: | 0.3.0 |
Built: | 2025-01-20 03:24:48 UTC |
Source: | https://github.com/cran/samadb |
An R API providing access to a relational database with public macroeconomic data for South Africa, obtained from from the South African Reserve Bank (SARB) and Statistics South Africa (STATSSA), and updated on a regular basis via the EconData (https://www.econdata.co.za/) platform and automated scraping of the SARB and STATSSA websites. The database is maintained at the Department of Economics at Stellenbosch University.
Functions and data providing information about the available data
sm_datasources()
sm_datasets()
sm_series()
Function to retrieve the data from the database
Functions to reshape data and add temporal identifiers
sm_pivot_wider()
sm_pivot_longer()
sm_expand_date()
Function to export wide format data to Excel
Helper functions to convert inputs to R dates and transpose the data
Global Macros with core ID variables in the database
The macro .SAMADB_ID
contains the string c("dsid", "series")
denoting
the names of ID variables that identify the cross-sectional dimension in the database. All series codes are unique across datasets.
The macro .SAMADB_T
contains the string
c("date", "year", "quarter", "month", "day")
denoting temporal identifiers generated by sm_expand_date
.
The "date"
variable is sufficient to uniquely identify a point in time in the database.
Each value in the database is uniquely identified by dsid, series and date.
.SAMADB_ID .SAMADB_T
.SAMADB_ID .SAMADB_T
.SAMADB_ID .SAMADB_T
.SAMADB_ID .SAMADB_T
This function coerces date strings i.e. "YYYY-MM-DD"
or "YYYY-MM"
, years e.g. 2015
(numeric or character),
year-quarters e.g. "2015Q1"
or "2015-Q1"
, year-months e.g. "2015M01"
or "2015-M01"
or numeric values representing dates (e.g. previously imported Excel date) to a regular R date.
sm_as_date(x, end = FALSE, origin = "1899-12-30")
sm_as_date(x, end = FALSE, origin = "1899-12-30")
x |
a character date string |
end |
logical. |
origin |
a date or date-string that can be used as reference for converting numeric values to dates. The default corresponds to dates generated in Excel for Windows. See |
A Date
vector.
sm_as_date("2011-05") sm_as_date(2011) sm_as_date("2011Q1") sm_as_date("2011Q1", end = TRUE) sm_as_date("2011M2") sm_as_date("2011M2", end = TRUE)
sm_as_date("2011-05") sm_as_date(2011) sm_as_date("2011Q1") sm_as_date("2011Q1", end = TRUE) sm_as_date("2011M2") sm_as_date("2011M2", end = TRUE)
This is the main function of the package to retrieve data from the database.
sm_data( dsid = NULL, series = NULL, from = NULL, to = NULL, freq = NULL, labels = TRUE, wide = TRUE, expand.date = FALSE, ordered = TRUE, return.query = FALSE, ... )
sm_data( dsid = NULL, series = NULL, from = NULL, to = NULL, freq = NULL, labels = TRUE, wide = TRUE, expand.date = FALSE, ordered = TRUE, return.query = FALSE, ... )
dsid |
character. (Optional) id's of datasets matching the 'dsid' column of the 'DATASET' table (retrieved using |
series |
character. (Optional) codes of series matching the 'series' column of the 'SERIES' table (retrieved using |
from |
set the start time of the data retrieved by either supplying a start date, a date-string of the form |
to |
same as |
freq |
character. Return only series at a certain frequency. Allowed are values |
labels |
logical. |
wide |
logical. |
expand.date |
logical. |
ordered |
logical. |
return.query |
logical. |
... |
further arguments passed to |
Series from datasets at different frequencies can be queried, but, if wide = TRUE
, this will result in missing values in the lower frequency series.
A data.table
with the result of the query.
sm_pivot_wider
, sm_expand_date
, samadb
# Return all electricity indicators from 2000 sm_data("ELECTRICITY", from = 2000)
# Return all electricity indicators from 2000 sm_data("ELECTRICITY", from = 2000)
This function pulls and return a table called 'DATASET' from the database.
sm_datasets(ordered = TRUE)
sm_datasets(ordered = TRUE)
ordered |
logical. |
The 'DATASET' table gives information about the different datasets fetched from different providers at regular intervals. It provides a unique id for each dataset, the frequency of data, the number of records (datapoints) in each dataset, the minimum and maximum time coverage, when the dataset was last updated, and information about the data source, provider, and method of data access.
A data.table
with information about the available datasets in the database.
sm_datasources
, sm_series
, samadb
sm_datasets()
sm_datasets()
This function pulls and returns a table called 'DATASOURCE' from the database.
sm_datasources(ordered = TRUE)
sm_datasources(ordered = TRUE)
ordered |
logical. |
The 'DATASOURCE' table gives information about the sources of data in this database, including the source website, and the number of datasets available from the source.
A data.table
with information about the sources of data in the database.
sm_datasets
, sm_series
, samadb
sm_datasources()
sm_datasources()
This function expands a date column and generates additional temporal identifiers from it (year, month, quarter, day).
sm_expand_date( x, gen = c("year", "quarter", "month"), origin = "1899-12-30", keep.date = TRUE, remove.missing.date = TRUE, sort = TRUE, as.factor = TRUE, name = "date", ... )
sm_expand_date( x, gen = c("year", "quarter", "month"), origin = "1899-12-30", keep.date = TRUE, remove.missing.date = TRUE, sort = TRUE, as.factor = TRUE, name = "date", ... )
x |
either a vector of class 'Date', or coercible to date using |
gen |
character. A vector of identifiers to generate from |
origin |
character / Date. Passed to |
keep.date |
logical. |
remove.missing.date |
logical. |
sort |
logical. |
as.factor |
|
name |
character. The name of the date variable to expand. |
... |
not used. |
A data.table
containing the computed identifiers as columns. See Examples.
# First a basic example x <- seq.Date(as.Date("1999-01-01"), as.Date("2000-01-01"), by = "month") sm_expand_date(x) sm_expand_date(x, gen = .SAMADB_T[-1L], keep.date = FALSE) # Now using the API sm_expand_date(sm_data("BUSINESS_CYCLES")) # Same thing sm_data("BUSINESS_CYCLES", expand.date = TRUE)
# First a basic example x <- seq.Date(as.Date("1999-01-01"), as.Date("2000-01-01"), by = "month") sm_expand_date(x) sm_expand_date(x, gen = .SAMADB_T[-1L], keep.date = FALSE) # Now using the API sm_expand_date(sm_data("BUSINESS_CYCLES")) # Same thing sm_data("BUSINESS_CYCLES", expand.date = TRUE)
This function automatically reshapes wide (column-based) data into a long format akin to the format of the raw data coming from the database (sm_data(..., wide = FALSE)
).
Internally it uses melt
from data.table.
sm_pivot_longer( data, id_cols = intersect(.SAMADB_T, names(data)), to_value = setdiff(names(data), id_cols), variable_name = "series", value_name = "value", label_name = "label", na.rm = TRUE, variable.factor = TRUE, label.factor = TRUE, ... )
sm_pivot_longer( data, id_cols = intersect(.SAMADB_T, names(data)), to_value = setdiff(names(data), id_cols), variable_name = "series", value_name = "value", label_name = "label", na.rm = TRUE, variable.factor = TRUE, label.factor = TRUE, ... )
data |
a wide format data frame where all series have their own column. |
id_cols |
character. Temporal identifiers of the data. By default all variables in |
to_value |
character. The names of all series to be stacked into the long format data frame. |
variable_name |
character. The name of the variable to store the names of the series. |
value_name |
character. The name of the variable to store the data values. |
label_name |
character. The name of the variable to store the series labels. |
na.rm |
logical. |
variable.factor , label.factor
|
logical. |
... |
further arguments passed to |
A data.table
with the reshaped data.
# Return all electricity indicators from the year 2000 onwards data <- sm_data("ELECTRICITY", from = 2000) sm_pivot_longer(data)
# Return all electricity indicators from the year 2000 onwards data <- sm_data("ELECTRICITY", from = 2000) sm_pivot_longer(data)
This function automatically reshapes long (stacked) raw data from the API (sm_data(..., wide = FALSE)
) to a wide format where each variable has its own column.
Internally it uses pivot
from collapse.
sm_pivot_wider( data, id_cols = intersect(.SAMADB_T, names(data)), names_from = "series", values_from = "value", labels_from = if (any(names(data) == "label")) "label" else NULL, expand.date = FALSE, ... )
sm_pivot_wider( data, id_cols = intersect(.SAMADB_T, names(data)), names_from = "series", values_from = "value", labels_from = if (any(names(data) == "label")) "label" else NULL, expand.date = FALSE, ... )
data |
raw data from the API: A long format data frame where all values are stacked in a value column. |
id_cols |
character. Temporal identifiers of the data. By default all variables in |
names_from |
character. The column containing the series codes. These will become the names of the new columns in the wider data format. |
values_from |
character. The column containing the data values. |
labels_from |
character. The column containing the labels describing the series. |
expand.date |
logical. |
... |
further arguments passed to |
A data.table
with the reshaped data.
# Return all electricity indicators from the year 2000 onwards sm_pivot_wider(sm_data("ELECTRICITY", from = 2000, wide = FALSE))
# Return all electricity indicators from the year 2000 onwards sm_pivot_wider(sm_data("ELECTRICITY", from = 2000, wide = FALSE))
This function pulls the 'SERIES' table from the database, providing information about the time series in the database. Each series is given a code which unique across datasets.
sm_series( dsid = NULL, series = NULL, dataset.info = FALSE, ordered = TRUE, return.query = FALSE )
sm_series( dsid = NULL, series = NULL, dataset.info = FALSE, ordered = TRUE, return.query = FALSE )
dsid |
character. (Optional) id's of datasources matching the 'dsid' column of the 'DATASET' table (retrieved using |
series |
character. (Optional) codes of series for which information in to be returned. If 'dsid' is also specificed, the two are combined using SQL 'OR' i.e. these series are retrieved in addition to all series matched through 'dsid'. |
dataset.info |
logical. |
ordered |
logical. |
return.query |
logical. |
Each series is given a code which is unique across datasets. Each series also has a label describing the series. Further information recorded are the series frequency, unit, whether it was seasonally adjusted, number of observations, minimum and maximum date, and (optionally) topic, alternative code provided by the data source (data retrieved from EconData uses EconData codes as series codes, so the 'src_code' field gives the codes used by the SARB or STATSSA), or further comments on the series.
A data.table
with information about the available series in the database.
sm_datasources
, sm_datasets
, sm_data
, samadb
# By default returns all series sm_series() # Adding information about the dataset and provider sm_series(dataset.info = TRUE) # Only series in the QB sm_series("QB")
# By default returns all series sm_series() # Adding information about the dataset and provider sm_series(dataset.info = TRUE) # Only series in the QB sm_series("QB")
This function is called by sm_write_excel
with option transpose = TRUE
to generate a row-based tabular data format from a wide data frame in R that is suitable for exporting to Excel.
sm_transpose(data, date.format = "%d/%m/%Y")
sm_transpose(data, date.format = "%d/%m/%Y")
data |
a wide format data frame where each column is a variable and the first variable uniquely identifies the data. |
date.format |
a format for date columns which is passed to |
A transposed data frame or data.table
(the class of the input is preserved).
transpose
, sm_pivot_wider
, sm_write_excel
, samadb
sm_transpose(sm_data("ELECTRICITY"))
sm_transpose(sm_data("ELECTRICITY"))
This function exports a wide format dataset to a column- (default) or row-oriented Excel format.
sm_write_excel( data, ..., transpose = FALSE, transpose.date.format = "%d/%m/%Y" )
sm_write_excel( data, ..., transpose = FALSE, transpose.date.format = "%d/%m/%Y" )
data |
a wide dataset from |
... |
further arguments to |
transpose |
logical. If |
transpose.date.format |
argument passed to |
Writes an Excel file to the specified path (no return value).
sm_transpose
, write_xlsx
, samadb
## Not run: # Getting electricity indicators from 2000 data <- sm_data("ELECTRICITY", from = 2000) # Saving to different Excel formats sm_write_excel(data, "ELECTRICITY.xlsx") sm_write_excel(data, "ELECTRICITY.xlsx", transpose = TRUE) # Saving to alternative path sm_write_excel(data, "C:/Users/.../ELECTRICITY.xlsx") ## End(Not run)
## Not run: # Getting electricity indicators from 2000 data <- sm_data("ELECTRICITY", from = 2000) # Saving to different Excel formats sm_write_excel(data, "ELECTRICITY.xlsx") sm_write_excel(data, "ELECTRICITY.xlsx", transpose = TRUE) # Saving to alternative path sm_write_excel(data, "C:/Users/.../ELECTRICITY.xlsx") ## End(Not run)