BDB ATAC-seq Pipeline: Full config.yaml Schema Reference

The config.yaml file is the single source of truth for the BDB-Genomics ATAC-seq framework. Every tool, path, resource limit, and QC threshold is declared here; Snakemake rules are stateless wrappers that read from this config at runtime. YAML anchors (e.g., &GENOME_FA, *GENOME_FA) centralise reference file paths so that changing one entry propagates everywhere automatically. Every tool block follows a uniform schema - input → output → params → threads → resources - making it straightforward to add new stages without touching existing ones.

Raw FASTQ paths are not declared directly in config.yaml. They are resolved dynamically from the sample sheet defined at global.samples (a TSV file with columns sample, fastq_r1, fastq_r2, replicate, condition).

Global / Project Metadata

global

Controls pipeline-wide behaviour, modality selection, and all shared reference file paths.Purpose: Declares the analysis mode, sample sheet location, and YAML anchors for genome references used by every downstream tool.

global.mode

string

default:"bulk"

Pipeline modality. Use "bulk" for standard bulk ATAC-seq or "scatac" for single-cell ATAC-seq. Can also be overridden at runtime with the ATAC_MODE environment variable.

global.samples

string

default:"data/fastp/samples.tsv"

Path to the TSV sample sheet. Required columns: sample, fastq_r1, fastq_r2, replicate, condition.

global.references.genome_fa

string

default:"data/reference/genome.fa"

Path to the reference genome FASTA file. Anchored as &GENOME_FA; referenced by bowtie2, picard, TOBIAS, chromVAR, footprinting, and peak annotation rules.

global.references.genome_sizes

string

default:"data/reference/genome.chrom.sizes"

Chromosome sizes file (two-column TSV: chrom, length). Anchored as &GENOME_SIZES; used by bedGraphToBigWig, TOBIAS, and chromVAR.

global.references.bowtie2_index

string

default:"data/reference/index/genome"

Bowtie2 index prefix (without .bt2 extension). Anchored as &BOWTIE2_INDEX.

global.references.chromap_index

string

default:"data/reference/chromap/genome.index"

Chromap binary index file for scATAC-seq alignment. Anchored as &CHROMAP_INDEX.

global.references.blacklist

string

default:"data/reference/ENCODE_blacklist.bed"

ENCODE blacklist BED file for filtering artefact regions. Anchored as &BLACKLIST.

global.references.annotation_gtf

string

default:"data/reference/annotation.gtf"

Gene annotation GTF file used for TSS enrichment calculation and peak annotation. Anchored as &ANNOTATION_GTF.

global.references.motif_db

string

default:"data/motifs/jaspar_vertebrates.meme"

MEME-format motif database (e.g., JASPAR vertebrates). Anchored as &MOTIF_DB; used by HOMER, TOBIAS, and chromVAR.

Stage 1 - Preprocessing

fastp

Purpose: Adapter trimming and quality filtering of raw paired-end FASTQ files.Output: results/preprocessing/fastp/

fastp.output

string

default:"results/preprocessing/fastp"

Directory for trimmed FASTQ files and fastp JSON reports.

fastp.params.trim_front1

integer

default:"5"

Number of bases to trim from the 5′ end of Read 1.

fastp.params.trim_front2

integer

default:"5"

Number of bases to trim from the 5′ end of Read 2.

fastp.params.length_required

integer

default:"30"

Minimum read length after trimming; shorter reads are discarded.

fastp.threads

integer

default:"4"

CPU threads allocated to fastp.

fastp.resources.mem_mb

integer

default:"8000"

Memory limit in megabytes.

fastp.resources.time

integer

default:"120"

Wall-clock time limit in minutes.

fastqc

Purpose: Post-trimming quality control report generation.Input: results/preprocessing/fastp/ (R1 and R2 trimmed reads)Output: results/preprocessing/fastqc/

fastqc.threads

integer

default:"4"

CPU threads allocated to FastQC.

fastqc.resources.mem_mb

integer

default:"2000"

Memory limit in megabytes.

fastqc.resources.time

integer

default:"30"

Wall-clock time limit in minutes.

Stage 2 - Alignment

bowtie2

Purpose: Paired-end alignment of trimmed reads to the reference genome (bulk ATAC-seq mode).Input: results/preprocessing/fastp/Output: results/alignment/bowtie2/

bowtie2.params.index

string

default:"*BOWTIE2_INDEX"

Bowtie2 index prefix, resolved from the global.references.bowtie2_index anchor.

bowtie2.params.sensitive

string

default:"--very-sensitive"

Alignment sensitivity flag passed directly to bowtie2.

bowtie2.threads

integer

default:"8"

CPU threads allocated to bowtie2.

bowtie2.resources.mem_mb

integer

default:"16000"

Memory limit in megabytes.

bowtie2.resources.time

integer

default:"240"

Wall-clock time limit in minutes.

chromap (scATAC-seq)

Purpose: Fast single-cell ATAC-seq read alignment using the Chromap aligner (--preset atac). Only active when global.mode = "scatac".Input: results/preprocessing/fastp/Output: results/alignment/chromap/

chromap.params.index

string

default:"*CHROMAP_INDEX"

Chromap binary index, resolved from the global.references.chromap_index anchor.

chromap.params.preset

string

default:"atac"

Chromap preset. Always "atac" for ATAC-seq data.

chromap.threads

integer

default:"16"

CPU threads allocated to Chromap.

chromap.resources.mem_mb

integer

default:"32000"

Memory limit in megabytes.

chromap.resources.time

integer

default:"120"

Wall-clock time limit in minutes.

Stage 3 - Post-Alignment Processing

samtools_sort

Purpose: Coordinate-sorts the raw BAM produced by Bowtie2.Input: results/alignment/bowtie2/Output: results/post_alignment/samtools_sort/

samtools_sort.threads

integer

default:"4"

CPU threads.

samtools_sort.resources.mem_mb

integer

default:"8000"

Memory limit (MB).

samtools_sort.resources.time

integer

default:"120"

Time limit (minutes).

mitoATAC_calculate

Purpose: Calculates mitochondrial read fractions before deduplication for QC reporting.Input: results/post_alignment/samtools_sort/Output: results/post_alignment/mito-ATAC/

mitoATAC_calculate.params.mito_chr

string

default:"chrMT"

Chromosome name for the mitochondrial contig. Common alternatives: chrM, MT.

mitoATAC_calculate.threads

integer

default:"2"

CPU threads.

mitoATAC_calculate.resources.mem_mb

integer

default:"4000"

Memory limit (MB).

mitoATAC_calculate.resources.time

integer

default:"30"

Time limit (minutes).

samtools_fixmate

Purpose: Fills in mate-score tags required by samtools markdup.Input: results/post_alignment/samtools_sort/Output: results/post_alignment/samtools_fixmate/

samtools_fixmate.threads

integer

default:"2"

CPU threads.

samtools_fixmate.resources.mem_mb

integer

default:"2000"

Memory limit (MB).

samtools_fixmate.resources.time

integer

default:"30"

Time limit (minutes).

samtools_markdup

Purpose: Marks (and optionally removes) PCR duplicates.Input: results/post_alignment/samtools_fixmate/Output: results/post_alignment/samtools_markdup/

samtools_markdup.params.remove_duplicates

boolean

default:"false"

When false, duplicates are marked but retained. Set to true to remove them entirely.

samtools_markdup.threads

integer

default:"4"

CPU threads.

samtools_markdup.resources.mem_mb

integer

default:"8000"

Memory limit (MB).

samtools_markdup.resources.time

integer

default:"120"

Time limit (minutes).

samtools_index_post_markdup

Purpose: Indexes the deduplicated BAM so Picard and downstream rules can random-access it.Input: results/post_alignment/samtools_markdup/Output: results/post_alignment/samtools_index/post_markdup/

samtools_index_post_markdup.threads

integer

default:"2"

CPU threads.

samtools_index_post_markdup.resources.mem_mb

integer

default:"2000"

Memory limit (MB).

samtools_index_post_markdup.resources.time

integer

default:"30"

Time limit (minutes).

remove_mito_reads

Purpose: Removes mitochondrial reads from the deduplicated BAM using exact chromosome matching.Input: results/post_alignment/samtools_markdup/Output: results/post_alignment/remove_mito_reads/

remove_mito_reads.params.mito_chr

string

default:"chrMT"

Mitochondrial contig name to exclude. Must match the contig name in the BAM header.

remove_mito_reads.threads

integer

default:"2"

CPU threads.

remove_mito_reads.resources.mem_mb

integer

default:"2000"

Memory limit (MB).

remove_mito_reads.resources.time

integer

default:"30"

Time limit (minutes).

samtools_index

Purpose: Indexes the mitochondrial-free sorted BAM.Input: results/post_alignment/remove_mito_reads/Output: results/post_alignment/samtools_index/

samtools_index.threads

integer

default:"2"

CPU threads.

samtools_index.resources.mem_mb

integer

default:"2000"

Memory limit (MB).

samtools_index.resources.time

integer

default:"30"

Time limit (minutes).

samtools_view

Purpose: Filters reads by MAPQ and SAM flags, retaining only high-quality properly paired alignments.Input: results/post_alignment/remove_mito_reads/Output: results/post_alignment/samtools_view/

samtools_view.params.MAPQ

integer

default:"30"

Minimum mapping quality score. Reads below this threshold are discarded.

samtools_view.params.flags

integer

default:"3844"

Bitwise SAM flag filter (-F). The default 3844 excludes unmapped, mate-unmapped, not-primary, failing QC, and supplementary alignments.

samtools_view.threads

integer

default:"2"

CPU threads.

samtools_view.resources.mem_mb

integer

default:"2000"

Memory limit (MB).

samtools_view.resources.time

integer

default:"30"

Time limit (minutes).

remove_blacklist_reads

Purpose: Removes reads overlapping ENCODE blacklist regions using bedtools intersect.Input: results/post_alignment/samtools_view/ + global.references.blacklistOutput: results/post_alignment/samtools_view/ (clean BAM replaces the filtered BAM in-place by convention)

remove_blacklist_reads.threads

integer

default:"2"

CPU threads.

remove_blacklist_reads.resources.mem_mb

integer

default:"4000"

Memory limit (MB).

remove_blacklist_reads.resources.time

integer

default:"30"

Time limit (minutes).

samtools_index_post_filter

Purpose: Indexes the blacklist-filtered BAM. The .bai index file is placed alongside the BAM in the same directory.Input / Output: results/post_alignment/samtools_view/

samtools_index_post_filter.threads

integer

default:"2"

CPU threads.

samtools_index_post_filter.resources.mem_mb

integer

default:"2000"

Memory limit (MB).

samtools_index_post_filter.resources.time

integer

default:"30"

Time limit (minutes).

tn5_shift

Purpose: Applies the Tn5 transposase insertion bias shift (+4 bp on the forward strand, −5 bp on the reverse strand) using alignmentSieve.Input: results/post_alignment/samtools_view/Output: results/post_alignment/tn5_shift/ (shifted BAM + index)

tn5_shift.threads

integer

default:"4"

CPU threads.

tn5_shift.resources.mem_mb

integer

default:"4000"

Memory limit (MB).

tn5_shift.resources.time

integer

default:"60"

Time limit (minutes).

samtools_stats

Purpose: Generates comprehensive alignment statistics used by the QC gate (mapping rate, duplicate rate).Input: results/post_alignment/remove_mito_reads/Output: results/post_alignment/samtools_stats/

samtools_stats.threads

integer

default:"2"

CPU threads.

samtools_stats.resources.mem_mb

integer

default:"2000"

Memory limit (MB).

samtools_stats.resources.time

integer

default:"30"

Time limit (minutes).

Stage 4 - Metrics & QC

fragment_size_analysis

Purpose: Analyses nucleosomal banding patterns from Picard insert-size metrics (sub-nucleosomal <200 bp, mono-nucleosomal 200-400 bp, di-nucleosomal >400 bp).Input: results/metrics_qc/picard/CollectInsertSizeMetrics/Output: results/metrics_qc/fragment_size_analysis/

fragment_size_analysis.params.min_length

integer

default:"30"

Minimum fragment length to include in the analysis.

fragment_size_analysis.params.max_length

integer

default:"1000"

Maximum fragment length to include in the analysis.

fragment_size_analysis.params.max_fragment

integer

default:"1000"

Upper x-axis limit for fragment size histogram plots.

fragment_size_analysis.threads

integer

default:"4"

CPU threads.

fragment_size_analysis.resources.mem_mb

integer

default:"4000"

Memory limit (MB).

fragment_size_analysis.resources.time

integer

default:"60"

Time limit (minutes).

tss_enrichment

Purpose: Calculates TSS enrichment score by computing normalised signal across ±2 kb windows around annotated transcription start sites.Input: Tn5-shifted BAM + index (results/post_alignment/tn5_shift/)Output: results/metrics_qc/tss_enrichment/Script: rules/scripts/tss_enrichment.R

tss_enrichment.params.annotation

string

default:"*ANNOTATION_GTF"

Gene annotation GTF, resolved from global.references.annotation_gtf.

tss_enrichment.params.upstream

integer

default:"2000"

Base pairs upstream of the TSS to include in the enrichment window.

tss_enrichment.params.downstream

integer

default:"2000"

Base pairs downstream of the TSS to include in the enrichment window.

tss_enrichment.threads

integer

default:"4"

CPU threads.

tss_enrichment.resources.mem_mb

integer

default:"4000"

Memory limit (MB).

tss_enrichment.resources.time

integer

default:"60"

Time limit (minutes).

picard - alignment_metrics

Purpose: Collects alignment summary metrics (total reads, mapped rate, strand balance) from the deduplicated BAM.Input: results/post_alignment/samtools_markdup/Output: results/metrics_qc/picard/CollectAlignmentSummaryMetrics/

picard.alignment_metrics.params.reference_genome

string

default:"*GENOME_FA"

Reference FASTA for validation, resolved from global.references.genome_fa.

picard.alignment_metrics.params.validation_stringency

string

default:"LENIENT"

Picard validation stringency. LENIENT suppresses non-critical warnings from legacy BAM files.

picard.alignment_metrics.threads

integer

default:"4"

CPU threads.

picard.alignment_metrics.resources.mem_mb

integer

default:"4000"

Memory limit (MB).

picard.alignment_metrics.resources.time

integer

default:"60"

Time limit (minutes).

picard - insert_metrics

Purpose: Collects insert-size distribution metrics used for fragment size analysis and nucleosome banding QC.Input: results/post_alignment/samtools_markdup/Output (metrics + histogram): results/metrics_qc/picard/CollectInsertSizeMetrics/

picard.insert_metrics.params.M

float

default:"0.05"

Minimum fraction of reads required for a histogram data point to be plotted.

picard.insert_metrics.params.validation_stringency

string

default:"LENIENT"

Picard validation stringency.

picard.insert_metrics.threads

integer

default:"4"

CPU threads.

picard.insert_metrics.resources.mem_mb

integer

default:"4000"

Memory limit (MB).

picard.insert_metrics.resources.time

integer

default:"60"

Time limit (minutes).

cross_correlation

Purpose: Computes NSC (Normalized Strand Cross-correlation) and RSC (Relative Strand Cross-correlation) coefficients via phantompeakqualtools. ENCODE-compliant strand-shift QC.Input: results/post_alignment/samtools_view/Output: results/metrics_qc/cross_correlation/

cross_correlation.params.num_threads

integer

default:"4"

Threads passed internally to the R-based cross-correlation tool.

cross_correlation.params.max_range

integer

default:"150"

Maximum strand-shift range (bp) to evaluate for the cross-correlation curve.

cross_correlation.threads

integer

default:"4"

CPU threads.

cross_correlation.resources.mem_mb

integer

default:"8000"

Memory limit (MB).

cross_correlation.resources.time

integer

default:"120"

Time limit (minutes).

preseq

Purpose: Estimates library complexity and predicts yield at higher sequencing depths using a curve extrapolation model.Input: results/post_alignment/remove_mito_reads/Output: results/reporting_qc/preseq/

preseq.threads

integer

default:"1"

CPU threads (single-threaded tool).

preseq.resources.mem_mb

integer

default:"4000"

Memory limit (MB).

preseq.resources.time

integer

default:"60"

Time limit (minutes).

qualimap_bamqc

Purpose: Generates comprehensive BAM QC metrics including coverage uniformity, GC content, and read distribution.Input: results/post_alignment/samtools_markdup/Output: results/reporting_qc/qualimap/

qualimap_bamqc.threads

integer

default:"4"

CPU threads.

qualimap_bamqc.resources.mem_mb

integer

default:"4000"

Memory limit (MB).

qualimap_bamqc.resources.time

integer

default:"60"

Time limit (minutes).

QC Gate

qc_gate

Purpose: Automated biological checkpoint. Evaluates four gated metrics - FRiP, TSS enrichment, mapping rate, and duplicate rate - and writes per-sample pass/fail files that downstream rules depend on.Inputs:

FRiP file: results/peak_calling/frip_calculation/
TSS enrichment file: results/metrics_qc/tss_enrichment/
Samtools stats: results/post_alignment/samtools_stats/

Output: results/qc_gate/

qc_gate.params.min_frip

float

default:"0.2"

Minimum FRiP (Fraction of Reads in Peaks). ENCODE minimum recommendation.

qc_gate.params.min_tss_enr

float

default:"7.0"

Minimum TSS Enrichment score. Signal at TSSes relative to background.

qc_gate.params.min_mapping_rate

float

default:"80.0"

Minimum percentage of properly paired reads (from samtools stats).

qc_gate.params.max_duplicate_rate

float

default:"20.0"

Maximum duplicate rate percentage - (reads_duplicated / total_reads) × 100.

qc_gate.threads

integer

default:"1"

CPU threads.

qc_gate.resources.mem_mb

integer

default:"1000"

Memory limit (MB).

qc_gate.resources.time

integer

default:"10"

Time limit (minutes).

Stage 5 - Visualization

bedtools_genomecov

Purpose: Converts the Tn5-shifted BAM to a raw bedGraph coverage track.Input: results/post_alignment/tn5_shift/Output: results/visualization/bedtools_genomecov/

bedtools_genomecov.params.extra

string

default:"-bg"

Extra flags passed to bedtools genomecov. -bg produces bedGraph format.

bedtools_genomecov.threads

integer

default:"4"

CPU threads.

bedtools_genomecov.resources.mem_mb

integer

default:"4000"

Memory limit (MB).

bedtools_genomecov.resources.time

integer

default:"60"

Time limit (minutes).

sorted_bedgraph

Purpose: Sorts the bedGraph file by chromosome and coordinate (required by bedGraphToBigWig).Input: results/visualization/bedtools_genomecov/Output: results/visualization/sorted_bedgraph_file/

sorted_bedgraph.threads

integer

default:"4"

CPU threads.

sorted_bedgraph.resources.mem_mb

integer

default:"4000"

Memory limit (MB).

sorted_bedgraph.resources.time

integer

default:"60"

Time limit (minutes).

bigwig

Purpose: Converts the sorted bedGraph to a binary BigWig file for genome browser visualisation.Input: results/visualization/sorted_bedgraph_file/Output: results/visualization/bigwig/

bigwig.params.genome

string

default:"*GENOME_SIZES"

Chromosome sizes file, resolved from global.references.genome_sizes.

bigwig.threads

integer

default:"4"

CPU threads.

bigwig.resources.mem_mb

integer

default:"4000"

Memory limit (MB).

bigwig.resources.time

integer

default:"60"

Time limit (minutes).

normalized_coverage

Purpose: Generates CPM-normalised BigWig tracks for cross-sample comparability using bamCoverage.Input: results/post_alignment/tn5_shift/Output: results/visualization/normalized_coverage/

normalized_coverage.params.method

string

default:"CPM"

Normalisation method passed to bamCoverage. Options: CPM, RPKM, BPM, RPGC, None.

normalized_coverage.threads

integer

default:"4"

CPU threads.

normalized_coverage.resources.mem_mb

integer

default:"8000"

Memory limit (MB).

normalized_coverage.resources.time

integer

default:"120"

Time limit (minutes).

correlation_analysis

Purpose: Computes inter-sample Pearson/Spearman correlations across BigWig tracks using deepTools multiBigwigSummary + plotCorrelation.Input: results/visualization/bigwig/Output: results/visualization/correlation_analysis/

correlation_analysis.params.bin_size

integer

default:"1000"

Genomic bin size in base pairs used for signal summarisation before correlation.

correlation_analysis.threads

integer

default:"4"

CPU threads.

correlation_analysis.resources.mem_mb

integer

default:"8000"

Memory limit (MB).

correlation_analysis.resources.time

integer

default:"120"

Time limit (minutes).

heatmap

Purpose: Generates read-density heatmaps centred on filtered peak regions using deepTools computeMatrix + plotHeatmap.Inputs:

results/peak_calling/filtered_peaks/
results/visualization/bigwig/

Outputs:

Plot: results/visualization/heatmap/plot/
Matrix: results/visualization/heatmap/matrix/
Regions: results/visualization/heatmap/

heatmap.params.color

string

default:"coolwarm"

Colormap name for the heatmap. Any matplotlib colormap is accepted.

heatmap.params.upstream

integer

default:"3000"

Base pairs upstream of peak centre to include.

heatmap.params.downstream

integer

default:"3000"

Base pairs downstream of peak centre to include.

heatmap.threads

integer

default:"8"

CPU threads.

heatmap.resources.mem_mb

integer

default:"16000"

Memory limit (MB).

heatmap.resources.time

integer

default:"240"

Time limit (minutes).

Stage 6 - Peak Calling

macs2

Purpose: Calls chromatin accessibility peaks from the Tn5-shifted BAM in paired-end BAM mode.Input: results/post_alignment/tn5_shift/Output: results/peak_calling/macs2_peakcall/

macs2.params.genome_size

string

default:"hs"

Effective genome size. Use "hs" (human), "mm" (mouse), "ce" (C. elegans), "dm" (Drosophila), or a numeric value.

macs2.params.qvalue

float

default:"0.01"

Minimum q-value (FDR) threshold for peak calling.

macs2.params.nomodel

string

default:"--nomodel"

Disables MACS2’s shift model estimation. Required for ATAC-seq data.

macs2.params.format

string

default:"BAMPE"

Input format flag. BAMPE instructs MACS2 to use both mates of a paired-end BAM.

macs2.threads

integer

default:"8"

CPU threads.

macs2.resources.mem_mb

integer

default:"16000"

Memory limit (MB).

macs2.resources.time

integer

default:"240"

Time limit (minutes).

blacklist_filter

Purpose: Removes peaks overlapping ENCODE blacklist regions from the MACS2 narrowPeak output.Input: results/peak_calling/macs2_peakcall/Output: results/peak_calling/filtered_peaks/

blacklist_filter.params.blacklist

string

default:"*BLACKLIST"

Blacklist BED file, resolved from global.references.blacklist.

blacklist_filter.threads

integer

default:"4"

CPU threads.

blacklist_filter.resources.mem_mb

integer

default:"4000"

Memory limit (MB).

blacklist_filter.resources.time

integer

default:"60"

Time limit (minutes).

frip_calculation

Purpose: Computes the FRiP score - fraction of total reads overlapping filtered peak regions.Inputs:

Filtered peaks: results/peak_calling/filtered_peaks/
Shifted BAM: results/post_alignment/tn5_shift/

Output: results/peak_calling/frip_calculation/

frip_calculation.threads

integer

default:"4"

CPU threads.

frip_calculation.resources.mem_mb

integer

default:"2000"

Memory limit (MB).

frip_calculation.resources.time

integer

default:"15"

Time limit (minutes).

peak_annotation

Purpose: Annotates filtered peaks with genomic features (promoter, intron, exon, intergenic) using HOMER or ChIPseeker.Input: results/peak_calling/filtered_peaks/Output: results/peak_calling/peak_annotation/

peak_annotation.params.gff

string

default:"*ANNOTATION_GTF"

Annotation GTF resolved from global.references.annotation_gtf.

peak_annotation.params.genome

string

default:"*GENOME_FA"

Reference FASTA resolved from global.references.genome_fa.

peak_annotation.threads

integer

default:"8"

CPU threads.

peak_annotation.resources.mem_mb

integer

default:"16000"

Memory limit (MB).

peak_annotation.resources.time

integer

default:"240"

Time limit (minutes).

motif_analysis

Purpose: Performs de novo and known motif enrichment analysis in filtered peaks using HOMER.Input: results/peak_calling/filtered_peaks/Output: results/peak_calling/motif_analysis/

motif_analysis.params.motif_db

string

default:"*MOTIF_DB"

MEME-format motif database, resolved from global.references.motif_db.

motif_analysis.params.genome_assembly

string

default:"*GENOME_FA"

Reference genome FASTA or assembly name.

motif_analysis.threads

integer

default:"8"

CPU threads.

motif_analysis.resources.mem_mb

integer

default:"16000"

Memory limit (MB).

motif_analysis.resources.time

integer

default:"240"

Time limit (minutes).

idr

Purpose: Computes Irreproducible Discovery Rate between biological replicates to assess peak reproducibility.Outputs:

IDR peaks: results/peak_calling/idr/idr_peaks/
Optimal peaks: results/peak_calling/idr/optimal_peaks/
Plots: results/peak_calling/idr/plots/

idr.params.idr_threshold

float

default:"0.05"

IDR significance threshold. Peaks below this value are considered reproducible.

idr.params.rank_column

string

default:"score"

Column used to rank peaks before IDR analysis.

idr.threads

integer

default:"4"

CPU threads.

idr.resources.mem_mb

integer

default:"4000"

Memory limit (MB).

idr.resources.time

integer

default:"60"

Time limit (minutes).

consensus_peaks

Purpose: Merges peaks across all samples into a single non-redundant consensus peak set.Outputs:

Consensus BED: results/peak_calling/consensus_peaks/
Sample count matrix: results/peak_calling/consensus_peaks/

consensus_peaks.params.min_samples

integer

default:"2"

Minimum number of samples in which a peak must be called to be retained in the consensus set.

consensus_peaks.params.merge_distance

integer

default:"100"

Maximum distance in base pairs between peaks to be merged into one consensus region.

consensus_peaks.threads

integer

default:"4"

CPU threads.

consensus_peaks.resources.mem_mb

integer

default:"8000"

Memory limit (MB).

consensus_peaks.resources.time

integer

default:"120"

Time limit (minutes).

count_peaks

Purpose: Counts reads in consensus peak regions across all samples to build a count matrix for DESeq2 differential accessibility analysis.Input: results/post_alignment/tn5_shift/Output: results/peak_calling/count_peaks/

count_peaks.threads

integer

default:"2"

CPU threads.

count_peaks.resources.mem_mb

integer

default:"2000"

Memory limit (MB).

count_peaks.resources.time

integer

default:"30"

Time limit (minutes).

differential_accessibility

Purpose: DESeq2-based differential chromatin accessibility analysis between conditions, with volcano, MA, and PCA plots.Outputs:

Results table: results/peak_calling/differential_accessibility/
Plots: results/peak_calling/differential_accessibility/plots/

differential_accessibility.params.fdr_threshold

float

default:"0.05"

Adjusted p-value (FDR) threshold for significance.

differential_accessibility.params.log2fc_threshold

float

default:"1.0"

Minimum absolute log₂ fold-change for a peak to be considered differentially accessible.

differential_accessibility.threads

integer

default:"8"

CPU threads.

differential_accessibility.resources.mem_mb

integer

default:"16000"

Memory limit (MB).

differential_accessibility.resources.time

integer

default:"240"

Time limit (minutes).

tobias

Purpose: Full TOBIAS pipeline (ATACorrect → ScoreBigwig → BINDetect) for bias-corrected transcription factor footprinting and differential TF binding analysis.Input: results/post_alignment/samtools_view/Outputs:

Bias-corrected BigWig: results/peak_calling/tobias/corrected_bw/
Footprint scores: results/peak_calling/tobias/footprint_bw/
BINDetect results: results/peak_calling/tobias/bindetect/

tobias.params.genome_fa

string

default:"*GENOME_FA"

Reference FASTA.

tobias.params.genome_sizes

string

default:"*GENOME_SIZES"

Chromosome sizes file.

tobias.params.blacklist

string

default:"*BLACKLIST"

Blacklist BED file.

tobias.params.motif_db

string

default:"*MOTIF_DB"

MEME motif database.

tobias.params.conditions

string

default:"condition"

Column name from the sample sheet used to group samples for comparative BINDetect analysis.

tobias.threads

integer

default:"8"

CPU threads.

tobias.resources.mem_mb

integer

default:"16000"

Memory limit (MB).

tobias.resources.time

integer

default:"240"

Time limit (minutes).

footprinting

Purpose: HINT-ATAC footprinting via the RGT toolkit for detection of TF-bound regulatory elements.Input: results/post_alignment/samtools_view/Outputs:

Footprint BED: results/peak_calling/footprinting/footprints/
Plots: results/peak_calling/footprinting/plots/

footprinting.params.organism

string

default:"hg38"

Organism genome assembly identifier (e.g., hg38, mm10).

footprinting.threads

integer

default:"8"

CPU threads.

footprinting.resources.mem_mb

integer

default:"16000"

Memory limit (MB).

footprinting.resources.time

integer

default:"240"

Time limit (minutes).

chromvar_analysis

Purpose: chromVAR motif accessibility deviation analysis - computes per-cell or per-sample TF activity scores from the shifted BAM.Input: results/post_alignment/tn5_shift/Outputs:

Deviation scores: results/peak_calling/chromvar/deviations/
Bias-corrected scores: results/peak_calling/chromvar/bias_corrected/
Plots: results/peak_calling/chromvar/plots/

chromvar_analysis.params.motif_db

string

default:"*MOTIF_DB"

MEME motif database.

chromvar_analysis.params.genome_fa

string

default:"*GENOME_FA"

Reference FASTA.

chromvar_analysis.params.genome_sizes

string

default:"*GENOME_SIZES"

Chromosome sizes file.

chromvar_analysis.threads

integer

default:"8"

CPU threads.

chromvar_analysis.resources.mem_mb

integer

default:"16000"

Memory limit (MB).

chromvar_analysis.resources.time

integer

default:"240"

Time limit (minutes).

scATAC-seq - ArchR & Cicero

archr

Purpose: Single-cell ATAC-seq analysis using ArchR: Arrow file creation, doublet detection and filtering, iterative LSI dimensionality reduction, UMAP clustering, and marker gene identification.Input: results/alignment/chromap/Outputs:

Arrow files: results/scatac/archr/arrow/
Filtered Arrow files: results/scatac/archr/filtered_arrow/
Fragments: results/scatac/archr/fragments/
Clusters: results/scatac/archr/clusters/
Markers: results/scatac/archr/markers/
Doublets report: results/scatac/archr/doublets/
Plots: results/scatac/archr/plots/
QC report: results/scatac/archr/qc_report/

archr.params.min_tss

float

default:"4.0"

Minimum TSS enrichment score for a cell barcode to be retained.

archr.params.min_frags

integer

default:"1000"

Minimum fragment count per cell barcode.

archr.params.max_frags

integer

default:"100000"

Maximum fragment count per cell barcode (doublet ceiling).

archr.params.tsse_method

string

default:"ArchR"

TSS enrichment calculation method.

archr.params.doublet_threshold

float

default:"0.2"

Doublet enrichment score cutoff for doublet removal.

archr.params.clustering_resolution

float

default:"0.8"

Seurat-style Leiden clustering resolution. Higher values produce more clusters.

archr.params.dims_to_use

string

default:"1:30"

LSI dimensions to use for UMAP embedding and clustering.

archr.params.force_dim_reduction

boolean

default:"true"

Forces recalculation of dimensionality reduction even if cached results exist.

archr.threads

integer

default:"16"

CPU threads.

archr.resources.mem_mb

integer

default:"64000"

Memory limit (MB).

archr.resources.time

integer

default:"240"

Time limit (minutes).

cicero

Purpose: Chromatin co-accessibility analysis using Cicero: identifies co-accessible regulatory elements and calls Cis-Co-Accessibility Networks (CCANs).Inputs:

Filtered Arrow files: results/scatac/archr/filtered_arrow/
Cell clusters: results/scatac/archr/clusters/

Outputs:

Co-accessibility connections: results/scatac/cicero/connections/
CCANs: results/scatac/cicero/ccans/
Plots: results/scatac/cicero/plots/

cicero.params.window_size

integer

default:"500"

Window size in kb used to build KNN graphs for co-accessibility analysis.

cicero.params.distance_cutoff

integer

default:"250000"

Maximum distance (bp) between co-accessible peaks to be included in a connection.

cicero.threads

integer

default:"8"

CPU threads.

cicero.resources.mem_mb

integer

default:"32000"

Memory limit (MB).

cicero.resources.time

integer

default:"240"

Time limit (minutes).

Reporting

benchmark_summary

Purpose: Aggregates per-rule Snakemake benchmark files into a single TSV summary of wall-clock time, CPU time, and peak memory usage across all pipeline stages.Output: results/reporting/benchmark_summary.tsv

multiqc

Purpose: Aggregates QC metrics from fastp, FastQC, Picard, samtools, preseq, Qualimap, and the QC gate into a single interactive HTML report.Output: results/reporting/multiqc/

multiqc.params.config

string

default:"rules/config/multiqc_config.yaml"

Path to the MultiQC configuration YAML file for custom module settings and report branding.

multiqc.threads

integer

default:"2"

CPU threads.

multiqc.resources.mem_mb

integer

default:"4000"

Memory limit (MB).

multiqc.resources.time

integer

default:"30"

Time limit (minutes).

Configuration Reference

Scripts

Changelog

BDB ATAC-seq Pipeline: Full config.yaml Schema Reference

Global / Project Metadata

Stage 1 - Preprocessing

Stage 2 - Alignment

Stage 3 - Post-Alignment Processing

Stage 4 - Metrics & QC

QC Gate

Stage 5 - Visualization

Stage 6 - Peak Calling

scATAC-seq - ArchR & Cicero

Reporting

Build docs developers (and LLMs) love

Configuration Reference

Scripts

Changelog

Documentation Index

​Global / Project Metadata

​Stage 1 - Preprocessing

​Stage 2 - Alignment

​Stage 3 - Post-Alignment Processing

​Stage 4 - Metrics & QC

​QC Gate

​Stage 5 - Visualization

​Stage 6 - Peak Calling

​scATAC-seq - ArchR & Cicero

​Reporting

Build docs developers (and LLMs) love

Global / Project Metadata

Stage 1 - Preprocessing

Stage 2 - Alignment

Stage 3 - Post-Alignment Processing

Stage 4 - Metrics & QC

QC Gate

Stage 5 - Visualization

Stage 6 - Peak Calling

scATAC-seq - ArchR & Cicero

Reporting