Package 'samadb' reference manual

Title:	South Africa Macroeconomic Database API
Description:	An R API providing access to a relational database with macroeconomic time series data for South Africa, obtained from the South African Reserve Bank (SARB) and Statistics South Africa (STATSSA), and updated on a weekly basis via the EconData <https://www.econdata.co.za/> platform and automated scraping of the SARB and STATSSA websites. The database is maintained at the Department of Economics at Stellenbosch University.
Authors:	Sebastian Krantz [aut, cre]
Maintainer:	Sebastian Krantz <[email protected]>
License:	GPL-3
Version:	0.3.0
Built:	2025-03-21 03:32:58 UTC
Source:	https://github.com/cran/samadb

South Africa Macroeconomic Database API

Description

An R API providing access to a relational database with public macroeconomic data for South Africa, obtained from from the South African Reserve Bank (SARB) and Statistics South Africa (STATSSA), and updated on a regular basis via the EconData (https://www.econdata.co.za/) platform and automated scraping of the SARB and STATSSA websites. The database is maintained at the Department of Economics at Stellenbosch University.

Functions

Functions and data providing information about the available data

sm_datasources()
sm_datasets()
sm_series()

Function to retrieve the data from the database

sm_data()

Functions to reshape data and add temporal identifiers

sm_pivot_wider()
sm_pivot_longer()
sm_expand_date()

Function to export wide format data to Excel

sm_write_excel()

Helper functions to convert inputs to R dates and transpose the data

sm_as_date()
sm_transpose()

Global Macros with core ID variables in the database

.SAMADB_ID
.SAMADB_T

Global Identifier Macros

Description

The macro .SAMADB_ID contains the string c("dsid", "series") denoting the names of ID variables that identify the cross-sectional dimension in the database. All series codes are unique across datasets.

The macro .SAMADB_T contains the string c("date", "year", "quarter", "month", "day") denoting temporal identifiers generated by sm_expand_date. The "date" variable is sufficient to uniquely identify a point in time in the database.

Each value in the database is uniquely identified by dsid, series and date.

Usage

.SAMADB_ID
.SAMADB_T
.SAMADB_ID
.SAMADB_T

Examples

.SAMADB_ID
.SAMADB_T


.SAMADB_ID
.SAMADB_T

Coerce Vectors to Dates

Description

This function coerces date strings i.e. "YYYY-MM-DD" or "YYYY-MM", years e.g. 2015 (numeric or character), year-quarters e.g. "2015Q1" or "2015-Q1", year-months e.g. "2015M01" or "2015-M01" or numeric values representing dates (e.g. previously imported Excel date) to a regular R date.

Usage

sm_as_date(x, end = FALSE, origin = "1899-12-30")
sm_as_date(x, end = FALSE, origin = "1899-12-30")

Arguments

`x`	a character date string `"YYYY-MM-DD"` or `"YYYY-MM"`, year-quarter `"YYYYQN"` or `"YYYY-QN"`, year-month `"YYYYMNN"` or `"YYYY-MNN"` or calendar year `YYYY` (numeric or character), or a numeric value corresponding to a date that can be passed to `as.Date.numeric`.
`end`	logical. `TRUE` replaces missing time information with a period end-date which is the last day of the period. `FALSE` replaces missing month and day information with `"-01"`, so the year date is the 1st of January, and for months / quarters the 1st day of the month / quarter.
`origin`	a date or date-string that can be used as reference for converting numeric values to dates. The default corresponds to dates generated in Excel for Windows. See `as.Date.numeric`.

Value

A Date vector.

Examples

sm_as_date("2011-05")
sm_as_date(2011)
sm_as_date("2011Q1")
sm_as_date("2011Q1", end = TRUE)
sm_as_date("2011M2")
sm_as_date("2011M2", end = TRUE)

sm_as_date("2011-05")
sm_as_date(2011)
sm_as_date("2011Q1")
sm_as_date("2011Q1", end = TRUE)
sm_as_date("2011M2")
sm_as_date("2011M2", end = TRUE)

Retrieve Data from the Database

Description

This is the main function of the package to retrieve data from the database.

Usage

sm_data(
  dsid = NULL,
  series = NULL,
  from = NULL,
  to = NULL,
  freq = NULL,
  labels = TRUE,
  wide = TRUE,
  expand.date = FALSE,
  ordered = TRUE,
  return.query = FALSE,
  ...
)
sm_data(
  dsid = NULL,
  series = NULL,
  from = NULL,
  to = NULL,
  freq = NULL,
  labels = TRUE,
  wide = TRUE,
  expand.date = FALSE,
  ordered = TRUE,
  return.query = FALSE,
  ...
)

Arguments

`dsid`	character. (Optional) id's of datasets matching the 'dsid' column of the 'DATASET' table (retrieved using `sm_datasets()`). If used, all series from the dataset are returned, in addition to any other series selected with `series`.
`series`	character. (Optional) codes of series matching the 'series' column of the 'SERIES' table (retrieved using `sm_series()`). If 'dsid' is also specificed, the two are combined using SQL 'OR' i.e. these series are retrieved in addition to all series matched through 'dsid'.
`from`	set the start time of the data retrieved by either supplying a start date, a date-string of the form `"YYYY-MM-DD"` or `"YYYY-MM"`, year-quarters of the form `"YYYYQN"` or `"YYYY-QN"`, or a numeric year `YYYY` (numeric or character). These expressions are converted to a regular date by `sm_as_date`.
`to`	same as `from`: to set the time period until which data is retrieved. For expressions that are not full "YYYY-MM-DD" dates, the last day of the period is chosen.
`freq`	character. Return only series at a certain frequency. Allowed are values `"D"` (Daily), `"W"` (Weekly), `"M"` (Monthly), `"Q"` (Quarterly), `"A"` (Annual), `"AF"` (Fiscal Years), matching the 'freq' column in the 'SERIES' table (retrieved using `sm_series()`).
`labels`	logical. `TRUE` will also return labels (series descriptions) along with the series codes.
`wide`	logical. `TRUE` calls `sm_pivot_wider` on the result. `FALSE` returns the data in a long format without missing values (suitable for `ggplot2`).
`expand.date`	logical. `TRUE` will call `sm_expand_date` on the result.
`ordered`	logical. `TRUE` orders the result by 'date' and, if `!is.null(dsid)`, `labels = TRUE` or `!is.null(freq)`, by series, maintaining a fixed order of series. `FALSE` does not explicitly order the result, to the benefit of faster query execution.
`return.query`	logical. `TRUE` will not query the database but instead return the constructed SQL query as a character string (for debugging purposes).
`...`	further arguments passed to `sm_pivot_wider` (if `wide = TRUE`) or `sm_expand_date` (if `expand.date = TRUE`), no conflicts between these two.

Details

Series from datasets at different frequencies can be queried, but, if wide = TRUE, this will result in missing values in the lower frequency series.

Value

A data.table with the result of the query.

Examples


# Return all electricity indicators from 2000
sm_data("ELECTRICITY", from = 2000)


# Return all electricity indicators from 2000
sm_data("ELECTRICITY", from = 2000)

Retrieve Datasets Table

Description

This function pulls and return a table called 'DATASET' from the database.

Usage

sm_datasets(ordered = TRUE)
sm_datasets(ordered = TRUE)

Arguments

ordered

logical. TRUE orders the result in the order data was entered into the database, while FALSE returns the result in a random order, to the benefit of faster query execution.

Details

The 'DATASET' table gives information about the different datasets fetched from different providers at regular intervals. It provides a unique id for each dataset, the frequency of data, the number of records (datapoints) in each dataset, the minimum and maximum time coverage, when the dataset was last updated, and information about the data source, provider, and method of data access.

Value

A data.table with information about the available datasets in the database.

Examples


sm_datasets()

sm_datasets()

Retrieve Data Sources Table

Description

This function pulls and returns a table called 'DATASOURCE' from the database.

Usage

sm_datasources(ordered = TRUE)
sm_datasources(ordered = TRUE)

Arguments

ordered

logical. TRUE orders the result in the order data was entered into the database, while FALSE returns the result in a random order, to the benefit of faster query execution.

Details

The 'DATASOURCE' table gives information about the sources of data in this database, including the source website, and the number of datasets available from the source.

Value

A data.table with information about the sources of data in the database.

Examples


sm_datasources()

sm_datasources()

Generate Temporal Identifiers from a Date Column

Description

This function expands a date column and generates additional temporal identifiers from it (year, month, quarter, day).

Usage

sm_expand_date(
  x,
  gen = c("year", "quarter", "month"),
  origin = "1899-12-30",
  keep.date = TRUE,
  remove.missing.date = TRUE,
  sort = TRUE,
  as.factor = TRUE,
  name = "date",
  ...
)
sm_expand_date(
  x,
  gen = c("year", "quarter", "month"),
  origin = "1899-12-30",
  keep.date = TRUE,
  remove.missing.date = TRUE,
  sort = TRUE,
  as.factor = TRUE,
  name = "date",
  ...
)

Arguments

`x`	either a vector of class 'Date', or coercible to date using `as.Date`, or a data frame / list containing with a date-column called `name`.
`gen`	character. A vector of identifiers to generate from `x`. The possible identifiers are found in `.SAMADB_T`.
`origin`	character / Date. Passed to `as.Date`: for converting numeric `x` to date. The default reflects converting date-numbers from Excel for Windows.
`keep.date`	logical. `TRUE` will keep the date variable in the resulting dataset, `FALSE` will remove the date variable in favor of the generated identifiers.
`remove.missing.date`	logical. `TRUE` will remove missing values in `x`. If `x` is a dataset, rows missing the date variable will be removed.
`sort`	logical. `TRUE` will sort the data by the date column.
`as.factor`	`TRUE` will generate quarters and months as factor variables. It is also possible to use `as.factor = "ordered"` to generate ordered factors. `FALSE` will generate quarters and months as integer variables.
`name`	character. The name of the date variable to expand.
`...`	not used.

Value

A data.table containing the computed identifiers as columns. See Examples.

Examples

# First a basic example
x <- seq.Date(as.Date("1999-01-01"), as.Date("2000-01-01"), by = "month")
sm_expand_date(x)
sm_expand_date(x, gen = .SAMADB_T[-1L], keep.date = FALSE)

# Now using the API
sm_expand_date(sm_data("BUSINESS_CYCLES"))

# Same thing
sm_data("BUSINESS_CYCLES", expand.date = TRUE)


# First a basic example
x <- seq.Date(as.Date("1999-01-01"), as.Date("2000-01-01"), by = "month")
sm_expand_date(x)
sm_expand_date(x, gen = .SAMADB_T[-1L], keep.date = FALSE)

# Now using the API
sm_expand_date(sm_data("BUSINESS_CYCLES"))

# Same thing
sm_data("BUSINESS_CYCLES", expand.date = TRUE)

Reshape Column-Based Data to Long Format

Description

This function automatically reshapes wide (column-based) data into a long format akin to the format of the raw data coming from the database (sm_data(..., wide = FALSE)). Internally it uses melt from data.table.

Usage

sm_pivot_longer(
  data,
  id_cols = intersect(.SAMADB_T, names(data)),
  to_value = setdiff(names(data), id_cols),
  variable_name = "series",
  value_name = "value",
  label_name = "label",
  na.rm = TRUE,
  variable.factor = TRUE,
  label.factor = TRUE,
  ...
)
sm_pivot_longer(
  data,
  id_cols = intersect(.SAMADB_T, names(data)),
  to_value = setdiff(names(data), id_cols),
  variable_name = "series",
  value_name = "value",
  label_name = "label",
  na.rm = TRUE,
  variable.factor = TRUE,
  label.factor = TRUE,
  ...
)

Arguments

`data`	a wide format data frame where all series have their own column.
`id_cols`	character. Temporal identifiers of the data. By default all variables in `.SAMADB_T` are selected.
`to_value`	character. The names of all series to be stacked into the long format data frame.
`variable_name`	character. The name of the variable to store the names of the series.
`value_name`	character. The name of the variable to store the data values.
`label_name`	character. The name of the variable to store the series labels.
`na.rm`	logical. `TRUE` will remove all missing values from the long data frame.
`variable.factor`, `label.factor`	logical. `TRUE` will code the "series" and "label" columns as factors, which is more memory efficient.
`...`	further arguments passed to `melt`.

Value

A data.table with the reshaped data.

Examples


# Return all electricity indicators from the year 2000 onwards
data <- sm_data("ELECTRICITY", from = 2000)
sm_pivot_longer(data)

# Return all electricity indicators from the year 2000 onwards
data <- sm_data("ELECTRICITY", from = 2000)
sm_pivot_longer(data)

Reshape Long API Data to Column-Based Format

Description

This function automatically reshapes long (stacked) raw data from the API (sm_data(..., wide = FALSE)) to a wide format where each variable has its own column. Internally it uses pivot from collapse.

Usage

sm_pivot_wider(
  data,
  id_cols = intersect(.SAMADB_T, names(data)),
  names_from = "series",
  values_from = "value",
  labels_from = if (any(names(data) == "label")) "label" else NULL,
  expand.date = FALSE,
  ...
)
sm_pivot_wider(
  data,
  id_cols = intersect(.SAMADB_T, names(data)),
  names_from = "series",
  values_from = "value",
  labels_from = if (any(names(data) == "label")) "label" else NULL,
  expand.date = FALSE,
  ...
)

Arguments

`data`	raw data from the API: A long format data frame where all values are stacked in a value column.
`id_cols`	character. Temporal identifiers of the data. By default all variables in `.SAMADB_T` are selected.
`names_from`	character. The column containing the series codes. These will become the names of the new columns in the wider data format.
`values_from`	character. The column containing the data values.
`labels_from`	character. The column containing the labels describing the series.
`expand.date`	logical. `TRUE` will call `sm_expand_date` on the data after reshaping.
`...`	further arguments passed to `pivot` or `sm_expand_date`.

Value

A data.table with the reshaped data.

Examples


# Return all electricity indicators from the year 2000 onwards
sm_pivot_wider(sm_data("ELECTRICITY", from = 2000, wide = FALSE))

# Return all electricity indicators from the year 2000 onwards
sm_pivot_wider(sm_data("ELECTRICITY", from = 2000, wide = FALSE))

Retrieve Series Table

Description

This function pulls the 'SERIES' table from the database, providing information about the time series in the database. Each series is given a code which unique across datasets.

Usage

sm_series(
  dsid = NULL,
  series = NULL,
  dataset.info = FALSE,
  ordered = TRUE,
  return.query = FALSE
)
sm_series(
  dsid = NULL,
  series = NULL,
  dataset.info = FALSE,
  ordered = TRUE,
  return.query = FALSE
)

Arguments

`dsid`	character. (Optional) id's of datasources matching the 'dsid' column of the 'DATASET' table (retrieved using `sm_datasets()`) for which series information is to be returned.
`series`	character. (Optional) codes of series for which information in to be returned. If 'dsid' is also specificed, the two are combined using SQL 'OR' i.e. these series are retrieved in addition to all series matched through 'dsid'.
`dataset.info`	logical. `TRUE` returns additional information from the 'DATASET' table (the dataset name, when data was last updated, the source id and the data provider and access mode).
`ordered`	logical. `TRUE` returns the series in a fixed order, while `FALSE` returns the result in a random order, to the benefit of faster query execution.
`return.query`	logical. `TRUE` will not query the database but instead return the constructed SQL query as a character string (for debugging purposes).

Details

Each series is given a code which is unique across datasets. Each series also has a label describing the series. Further information recorded are the series frequency, unit, whether it was seasonally adjusted, number of observations, minimum and maximum date, and (optionally) topic, alternative code provided by the data source (data retrieved from EconData uses EconData codes as series codes, so the 'src_code' field gives the codes used by the SARB or STATSSA), or further comments on the series.

Value

A data.table with information about the available series in the database.

Examples


# By default returns all series
sm_series()

# Adding information about the dataset and provider
sm_series(dataset.info = TRUE)

# Only series in the QB
sm_series("QB")

# By default returns all series
sm_series()

# Adding information about the dataset and provider
sm_series(dataset.info = TRUE)

# Only series in the QB
sm_series("QB")

Transpose a Wide Dataset to a Row-Based Format

Description

This function is called by sm_write_excel with option transpose = TRUE to generate a row-based tabular data format from a wide data frame in R that is suitable for exporting to Excel.

Usage

sm_transpose(data, date.format = "%d/%m/%Y")
sm_transpose(data, date.format = "%d/%m/%Y")

Arguments

`data`	a wide format data frame where each column is a variable and the first variable uniquely identifies the data.
`date.format`	a format for date columns which is passed to `format.Date`. When transposing wide, dates are converted to character. The default R YYYY-MM-DD format for dates is often not recognized by Excel. By default dates are transformed to DD/MM/YYYY format which Excel (UK English) recognizes. Putting `FALSE` here does not transform dates into another format.

Value

A transposed data frame or data.table (the class of the input is preserved).

Examples


sm_transpose(sm_data("ELECTRICITY"))

sm_transpose(sm_data("ELECTRICITY"))

Export Wide Data to Excel

Description

This function exports a wide format dataset to a column- (default) or row-oriented Excel format.

Usage

sm_write_excel(
  data,
  ...,
  transpose = FALSE,
  transpose.date.format = "%d/%m/%Y"
)
sm_write_excel(
  data,
  ...,
  transpose = FALSE,
  transpose.date.format = "%d/%m/%Y"
)

Arguments

`data`	a wide dataset from `sm_data` or reshaped to a wide format with `sm_pivot_wider`.
`...`	further arguments to `write_xlsx`. As a minimum a path needs to be supplied that ends with the name of the Excel file. See Examples.
`transpose`	logical. If `TRUE`, the result is returned in a row-oriented Excel format. The default is column oriented (same as the dataset in R).
`transpose.date.format`	argument passed to `sm_transpose`, setting the format of date columns when data is transposed.

Value

Writes an Excel file to the specified path (no return value).

Examples

## Not run: 
# Getting electricity indicators from 2000
data <- sm_data("ELECTRICITY", from = 2000)

# Saving to different Excel formats
sm_write_excel(data, "ELECTRICITY.xlsx")
sm_write_excel(data, "ELECTRICITY.xlsx", transpose = TRUE)

# Saving to alternative path
sm_write_excel(data, "C:/Users/.../ELECTRICITY.xlsx")

## End(Not run)

## Not run: 
# Getting electricity indicators from 2000
data <- sm_data("ELECTRICITY", from = 2000)

# Saving to different Excel formats
sm_write_excel(data, "ELECTRICITY.xlsx")
sm_write_excel(data, "ELECTRICITY.xlsx", transpose = TRUE)

# Saving to alternative path
sm_write_excel(data, "C:/Users/.../ELECTRICITY.xlsx")

## End(Not run)

Package 'samadb'

Help Index

South Africa Macroeconomic Database API

Description

Functions

Global Identifier Macros

Description

Usage

See Also

Examples

Coerce Vectors to Dates

Description

Usage

Arguments

Value

See Also

Examples

Retrieve Data from the Database

Description

Usage

Arguments

Details

Value

See Also

Examples

Retrieve Datasets Table

Description

Usage

Arguments

Details

Value

See Also

Examples

Retrieve Data Sources Table

Description

Usage

Arguments

Details

Value

See Also

Examples

Generate Temporal Identifiers from a Date Column

Description

Usage

Arguments

Value

See Also

Examples

Reshape Column-Based Data to Long Format

Description

Usage

Arguments

Value

See Also

Examples

Reshape Long API Data to Column-Based Format

Description

Usage

Arguments

Value

See Also

Examples

Retrieve Series Table

Description

Usage

Arguments

Details

Value

See Also

Examples

Transpose a Wide Dataset to a Row-Based Format

Description

Usage

Arguments

Value

See Also

Examples

Export Wide Data to Excel

Description

Usage