lsData and readData: discover and load raw study data

parse.R provides two lightweight utilities for getting raw data into the pipeline. lsData() discovers CSV files in the raw data directory and returns a named list of paths, while readData() loads a single file into a tidy data frame. Together they feed the first two targets in _targets.R.

lsData()

lsData <- function(path = "data/raw", ...) {
  #' List Data
  #'
  #' List all data file within `path` directory
  #'
  #' @param path A path of raw data directory, set to "data/raw" by default
  #' @return A list of complete relative path of each dataset

  filepath <- list.files(path, full.name = TRUE, recursive = TRUE, ...) %>%
    set_names(gsub(x = ., ".*_|\\w+/|\\.\\w*", ""))

  return(filepath)
}

Parameters

path

character

default:"\"data/raw\""

Path to the raw data directory. Passed directly to list.files() as the first argument.

...

any

Additional arguments passed to list.files(). Use pattern to filter files by extension or name.

Returns

A named character vector of full relative file paths. Names are derived from each filename by stripping the leading directory path and file extension using gsub(x = ., ".*_|\\w+/|\\.\\w*", ""). For example, a file at data/raw/data.csv gets the name "data".

Usage

raws <- lsData(pattern = "*csv")
# Returns named list: list("data" = "data/raw/data.csv")

In _targets.R:

raws <- lsData(pattern = "*csv")

lsData() is called at the top level of _targets.R — outside the list() of targets — so the file paths are resolved at pipeline definition time.

readData()

readData <- function(fpath, ...) {
  #' Read Data Frame
  #'
  #' Read external tabular data as a tidy data frame
  #'
  #' @param fpath Path name of the file to parse
  #' @inheritDotParams readr::read_csv
  #' @return A tidy data frame

  tbl <- readr::read_csv(fpath, ...)

  return(tbl)
}

Parameters

fpath

character

required

Full path to the CSV file to read. Typically one of the values from lsData().

...

any

Additional arguments passed to readr::read_csv(). Use these to control column type guessing, locale, encoding, or to skip rows.

Returns

A tibble (tidy data frame) with columns inferred from the CSV header row. Column types are guessed by readr using the first 1000 rows by default.

Usage

tbl <- readData("data/raw/data.csv")

In _targets.R:

tar_target(tbl, readData(fpath))

fpath is defined in the preceding tar_target(fpath, raws[["data"]], format = "file") target, which tells targets to track the file for changes.

readData() delegates entirely to readr::read_csv(), which handles a variety of CSV encodings and performs automatic type inference. If a column is being read as the wrong type, pass explicit col_types via ....

Data Functions

Modeling Functions

lsData()

Parameters

Returns

Usage

readData()

Parameters

Returns

Usage

Build docs developers (and LLMs) love

Data Functions

Modeling Functions

Documentation Index

​lsData()

​Parameters

​Returns

​Usage

​readData()

​Parameters

​Returns

​Usage

Build docs developers (and LLMs) love

lsData()

Parameters

Returns

Usage

readData()

Parameters

Returns

Usage