Use this file to discover all available pages before exploring further.
The config.yaml file is the single source of truth for every parameter in the BDB-Genomics ATAC-seq pipeline. Every tool path, resource limit, QC threshold, and reference genome pointer is declared here. Snakemake rules are intentionally stateless — they read from this config at runtime and contain no hard-coded values. This separation means you can tune, extend, or override any aspect of the pipeline without ever touching a rule file.
The config uses YAML anchors (&NAME) and aliases (*NAME) to centralise all reference file paths. A path is declared exactly once and then referenced everywhere it is needed. Changing a genome build is therefore a single-line edit.
Downstream tool blocks dereference these anchors with the * alias syntax:
bowtie2: params: index: *BOWTIE2_INDEX # expands to "data/reference/index/genome"remove_blacklist_reads: input: blacklist: *BLACKLIST # expands to "data/reference/ENCODE_blacklist.bed"
All seven anchor names — &GENOME_FA, &GENOME_SIZES, &BOWTIE2_INDEX, &CHROMAP_INDEX, &BLACKLIST, &ANNOTATION_GTF, and &MOTIF_DB — are validated by validate_config.py at startup. A missing or inaccessible path fails the run before the DAG is built.
Pipeline modality. Accepted values: "bulk" or "scatac". Can be overridden at runtime with the ATAC_MODE environment variable without editing this file.
Every tool block in the config follows a uniform five-field schema. This consistency makes the config self-documenting and allows the validation script to discover required keys automatically by scanning rule files.
<tool_name>: input: ... # source directory or file path(s) output: ... # destination directory params: ... # tool-specific flags and thresholds threads: ... # CPU allocation (positive integer) resources: # scheduler resource requests mem_mb: ... # memory in megabytes (positive integer) time: ... # wall-clock limit in minutes (positive integer)
fastp is the only tool without an input key — raw FASTQ paths are resolved dynamically from the sample sheet at runtime.
Snakemake supports layered config loading: values in a second --configfile argument override matching keys in the first. This is useful for parameter sweeps, CI testing, or per-project adjustments without modifying the canonical config.yaml.
Example: Loosening QC Gate Thresholds for Synthetic Data
When running the pipeline on synthetic or downsampled CI data, FRiP scores and TSS enrichment values will be well below production thresholds. Create an override file to relax these gates without altering the main config:
Only keys present in the override file are replaced. All other keys — including reference paths and tool parameters — remain exactly as declared in config.yaml. Never commit a relaxed override file as the default config.
The config ships a boilerplate template block at the bottom of config.yaml. Copy it, replace the placeholder names, and add the matching rule file and include: directive in the Snakefile:
template_category: template_tool: input: "results/preprocessing/fastp" output: "results/template_category/template_tool" params: message: "This is a boilerplate template." threads: 1 resources: mem_mb: 1000 time: 10