Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/namakala/inappropriate-acid-suppressor-agent-use/llms.txt

Use this file to discover all available pages before exploring further.

Before you run tar_make(), you must supply a single CSV file containing the extracted study data. The pipeline reads this file as its only external input — every downstream model, plot, and report derives from it. This page describes the required directory layout, the expected column format, and the variable transformations that clean() applies before any statistical modeling begins.

Directory structure

Place your extracted study data at exactly the following path relative to the project root:
data/
├── raw/
│   └── data.csv    ← place your extracted study data here
└── processed/      ← secondary products generated by the pipeline
The data/raw/ directory is scanned by lsData() at pipeline startup. The file must be named data.csv. The data/processed/ directory is reserved for any intermediate or feature-engineered outputs generated during the run.
data.csv must be placed in data/raw/ before you call tar_make(). If the file is missing, the fpath target will error and the entire pipeline will halt at the first step.

Input data format

Your data.csv must contain the following columns. Column names are matched exactly as they appear in the source code; extra columns are ignored.
ColumnTypeNotes
AuthorstringStudy author name(s)
Patient's agestringAge descriptor; copied as-is into the Age variable
Year of PublicationnumericFour-digit year; cast to integer by clean()
Prevalencestring or numericProportion of inappropriate use; accepts comma (,) or dot (.) as decimal separator
Sample sizenumericTotal number of patients; cast to integer
Inappropriate indicationnumericCount of patients with inappropriate indication; cast to integer
ContinentstringMust contain "Asia", "Europe", or "North America" as a substring; anything else maps to "Other"
SettingstringMust contain "Hospital" as a substring to map to "Hospital Setting"; otherwise "Other"
Guidelinestring"Yes" maps to "Followed Guideline(s)"; any other value maps to "No Guideline"
JBI_ClassificationstringMethodological quality classification; used as-is in subgroup and regression models

Data cleaning

The clean() function in src/R/clean.R applies all variable transformations in a single dplyr::mutate() call. The cleaned data frame is stored as the tbl_clean target and is the direct input to every modeling target.
clean <- function(tbl, ...) {
  #' Clean the Data Frame
  #'
  #' Clean the data frame by standardizing the variables.
  #'
  #' @param tbl A data frame object
  #'
  #' @return A data frame object

  res <- tbl |>
    dplyr::mutate(
      "Age" = `Patient's age`,
      "Year" = as.integer(`Year of Publication`),
      "Prevalence" = gsub(x = Prevalence, ",", ".") |> as.numeric(),
      "Sample_size" = as.integer(`Sample size`),
      "Inappropriate_indication" = as.integer(`Inappropriate indication`),
      "Continent" = dplyr::case_when(
        grepl(x = Continent, "Asia") ~ "Asia",
        grepl(x = Continent, "Europe") ~ "Europe",
        grepl(x = Continent, "North America") ~ "North America",
        .default =  "Other"
      ) |> factor(levels = c("North America", "Asia", "Europe", "Other")),
      "Setting" = dplyr::case_when(
        grepl(x = Setting, "Hospital") ~ "Hospital Setting",
        .default = "Other"
      ) |> factor(levels = c("Hospital Setting", "Other")),
      "use_guideline" = ifelse(
        `Guideline` == "Yes", "Followed Guideline(s)", "No Guideline"
      ),
      "logit_prevalence" = log(Prevalence / (1 - Prevalence)),
      "var_logit_prevalence" = 1 / (Sample_size * logit_prevalence)
    ) %>%
    set_names(names(.) |> make.names())

  return(res)
}
Each transformation is explained below.

Age standardization

"Age" = `Patient's age`
The backtick-quoted column Patient's age is renamed to the syntactically valid R name Age. No type conversion is applied; the value is carried forward as-is for use in subgroup analysis.

Year as integer

"Year" = as.integer(`Year of Publication`)
Year of Publication is cast to integer, dropping any trailing decimals that may result from spreadsheet export. Year is used as a continuous covariate in the multivariable meta-regression.

Prevalence normalization

"Prevalence" = gsub(x = Prevalence, ",", ".") |> as.numeric()
Some studies report prevalence with a comma decimal separator (e.g., 0,42). gsub() replaces all commas with dots before coercing to numeric. The resulting value is a proportion between 0 and 1.

Sample size as integer

"Sample_size" = as.integer(`Sample size`)
The total patient count is stored as an integer. It is used both as a covariate in the multivariable meta-regression and in the variance computation below.

Continent classification

"Continent" = dplyr::case_when(
  grepl(x = Continent, "Asia") ~ "Asia",
  grepl(x = Continent, "Europe") ~ "Europe",
  grepl(x = Continent, "North America") ~ "North America",
  .default =  "Other"
) |> factor(levels = c("North America", "Asia", "Europe", "Other"))
Free-text continent values are collapsed to four levels using substring matching. The reference level for modeling is "North America" (the first level of the factor).

Setting dichotomization

"Setting" = dplyr::case_when(
  grepl(x = Setting, "Hospital") ~ "Hospital Setting",
  .default = "Other"
) |> factor(levels = c("Hospital Setting", "Other"))
Any value containing "Hospital" maps to "Hospital Setting"; all others become "Other". The factor reference level is "Hospital Setting".

Guideline use categorization

"use_guideline" = ifelse(
  `Guideline` == "Yes", "Followed Guideline(s)", "No Guideline"
)
The binary Guideline column is recoded to a descriptive label. Studies that reported using a guideline are tagged "Followed Guideline(s)"; all others are tagged "No Guideline".

Logit prevalence

"logit_prevalence" = log(Prevalence / (1 - Prevalence))
The logit transformation stabilizes variance and is the effect measure used throughout the meta-analysis. Values of exactly 0 or 1 will produce -Inf or Inf; verify that no study reports a prevalence at the boundary before running the pipeline.

Variance of logit prevalence

"var_logit_prevalence" = 1 / (Sample_size * logit_prevalence)
This approximation of the within-study variance of the logit-transformed prevalence is passed to the meta-analysis functions as the vi (variance) argument.

Column name sanitization

set_names(names(.) |> make.names())
make.names() is applied to all column names in the final step, replacing spaces and special characters with dots so every column is a syntactically valid R name (e.g., Patient's agePatient.s.age).

Build docs developers (and LLMs) love