Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/atacseq-pipeline/llms.txt

Use this file to discover all available pages before exploring further.

The config.yaml file is the single source of truth for the BDB-Genomics ATAC-seq framework. Every tool, path, resource limit, and QC threshold is declared here; Snakemake rules are stateless wrappers that read from this config at runtime. YAML anchors (e.g., &GENOME_FA, *GENOME_FA) centralise reference file paths so that changing one entry propagates everywhere automatically. Every tool block follows a uniform schema - input → output → params → threads → resources - making it straightforward to add new stages without touching existing ones.
Raw FASTQ paths are not declared directly in config.yaml. They are resolved dynamically from the sample sheet defined at global.samples (a TSV file with columns sample, fastq_r1, fastq_r2, replicate, condition).

Global / Project Metadata

Controls pipeline-wide behaviour, modality selection, and all shared reference file paths.Purpose: Declares the analysis mode, sample sheet location, and YAML anchors for genome references used by every downstream tool.
global.mode
string
default:"bulk"
Pipeline modality. Use "bulk" for standard bulk ATAC-seq or "scatac" for single-cell ATAC-seq. Can also be overridden at runtime with the ATAC_MODE environment variable.
global.samples
string
default:"data/fastp/samples.tsv"
Path to the TSV sample sheet. Required columns: sample, fastq_r1, fastq_r2, replicate, condition.
global.references.genome_fa
string
default:"data/reference/genome.fa"
Path to the reference genome FASTA file. Anchored as &GENOME_FA; referenced by bowtie2, picard, TOBIAS, chromVAR, footprinting, and peak annotation rules.
global.references.genome_sizes
string
default:"data/reference/genome.chrom.sizes"
Chromosome sizes file (two-column TSV: chrom, length). Anchored as &GENOME_SIZES; used by bedGraphToBigWig, TOBIAS, and chromVAR.
global.references.bowtie2_index
string
default:"data/reference/index/genome"
Bowtie2 index prefix (without .bt2 extension). Anchored as &BOWTIE2_INDEX.
global.references.chromap_index
string
default:"data/reference/chromap/genome.index"
Chromap binary index file for scATAC-seq alignment. Anchored as &CHROMAP_INDEX.
global.references.blacklist
string
default:"data/reference/ENCODE_blacklist.bed"
ENCODE blacklist BED file for filtering artefact regions. Anchored as &BLACKLIST.
global.references.annotation_gtf
string
default:"data/reference/annotation.gtf"
Gene annotation GTF file used for TSS enrichment calculation and peak annotation. Anchored as &ANNOTATION_GTF.
global.references.motif_db
string
default:"data/motifs/jaspar_vertebrates.meme"
MEME-format motif database (e.g., JASPAR vertebrates). Anchored as &MOTIF_DB; used by HOMER, TOBIAS, and chromVAR.

Stage 1 - Preprocessing

Purpose: Adapter trimming and quality filtering of raw paired-end FASTQ files.Output: results/preprocessing/fastp/
fastp.output
string
default:"results/preprocessing/fastp"
Directory for trimmed FASTQ files and fastp JSON reports.
fastp.params.trim_front1
integer
default:"5"
Number of bases to trim from the 5′ end of Read 1.
fastp.params.trim_front2
integer
default:"5"
Number of bases to trim from the 5′ end of Read 2.
fastp.params.length_required
integer
default:"30"
Minimum read length after trimming; shorter reads are discarded.
fastp.threads
integer
default:"4"
CPU threads allocated to fastp.
fastp.resources.mem_mb
integer
default:"8000"
Memory limit in megabytes.
fastp.resources.time
integer
default:"120"
Wall-clock time limit in minutes.
Purpose: Post-trimming quality control report generation.Input: results/preprocessing/fastp/ (R1 and R2 trimmed reads)Output: results/preprocessing/fastqc/
fastqc.threads
integer
default:"4"
CPU threads allocated to FastQC.
fastqc.resources.mem_mb
integer
default:"2000"
Memory limit in megabytes.
fastqc.resources.time
integer
default:"30"
Wall-clock time limit in minutes.

Stage 2 - Alignment

Purpose: Paired-end alignment of trimmed reads to the reference genome (bulk ATAC-seq mode).Input: results/preprocessing/fastp/Output: results/alignment/bowtie2/
bowtie2.params.index
string
default:"*BOWTIE2_INDEX"
Bowtie2 index prefix, resolved from the global.references.bowtie2_index anchor.
bowtie2.params.sensitive
string
default:"--very-sensitive"
Alignment sensitivity flag passed directly to bowtie2.
bowtie2.threads
integer
default:"8"
CPU threads allocated to bowtie2.
bowtie2.resources.mem_mb
integer
default:"16000"
Memory limit in megabytes.
bowtie2.resources.time
integer
default:"240"
Wall-clock time limit in minutes.
Purpose: Fast single-cell ATAC-seq read alignment using the Chromap aligner (--preset atac). Only active when global.mode = "scatac".Input: results/preprocessing/fastp/Output: results/alignment/chromap/
chromap.params.index
string
default:"*CHROMAP_INDEX"
Chromap binary index, resolved from the global.references.chromap_index anchor.
chromap.params.preset
string
default:"atac"
Chromap preset. Always "atac" for ATAC-seq data.
chromap.threads
integer
default:"16"
CPU threads allocated to Chromap.
chromap.resources.mem_mb
integer
default:"32000"
Memory limit in megabytes.
chromap.resources.time
integer
default:"120"
Wall-clock time limit in minutes.

Stage 3 - Post-Alignment Processing

Purpose: Coordinate-sorts the raw BAM produced by Bowtie2.Input: results/alignment/bowtie2/Output: results/post_alignment/samtools_sort/
samtools_sort.threads
integer
default:"4"
CPU threads.
samtools_sort.resources.mem_mb
integer
default:"8000"
Memory limit (MB).
samtools_sort.resources.time
integer
default:"120"
Time limit (minutes).
Purpose: Calculates mitochondrial read fractions before deduplication for QC reporting.Input: results/post_alignment/samtools_sort/Output: results/post_alignment/mito-ATAC/
mitoATAC_calculate.params.mito_chr
string
default:"chrMT"
Chromosome name for the mitochondrial contig. Common alternatives: chrM, MT.
mitoATAC_calculate.threads
integer
default:"2"
CPU threads.
mitoATAC_calculate.resources.mem_mb
integer
default:"4000"
Memory limit (MB).
mitoATAC_calculate.resources.time
integer
default:"30"
Time limit (minutes).
Purpose: Fills in mate-score tags required by samtools markdup.Input: results/post_alignment/samtools_sort/Output: results/post_alignment/samtools_fixmate/
samtools_fixmate.threads
integer
default:"2"
CPU threads.
samtools_fixmate.resources.mem_mb
integer
default:"2000"
Memory limit (MB).
samtools_fixmate.resources.time
integer
default:"30"
Time limit (minutes).
Purpose: Marks (and optionally removes) PCR duplicates.Input: results/post_alignment/samtools_fixmate/Output: results/post_alignment/samtools_markdup/
samtools_markdup.params.remove_duplicates
boolean
default:"false"
When false, duplicates are marked but retained. Set to true to remove them entirely.
samtools_markdup.threads
integer
default:"4"
CPU threads.
samtools_markdup.resources.mem_mb
integer
default:"8000"
Memory limit (MB).
samtools_markdup.resources.time
integer
default:"120"
Time limit (minutes).
Purpose: Indexes the deduplicated BAM so Picard and downstream rules can random-access it.Input: results/post_alignment/samtools_markdup/Output: results/post_alignment/samtools_index/post_markdup/
samtools_index_post_markdup.threads
integer
default:"2"
CPU threads.
samtools_index_post_markdup.resources.mem_mb
integer
default:"2000"
Memory limit (MB).
samtools_index_post_markdup.resources.time
integer
default:"30"
Time limit (minutes).
Purpose: Removes mitochondrial reads from the deduplicated BAM using exact chromosome matching.Input: results/post_alignment/samtools_markdup/Output: results/post_alignment/remove_mito_reads/
remove_mito_reads.params.mito_chr
string
default:"chrMT"
Mitochondrial contig name to exclude. Must match the contig name in the BAM header.
remove_mito_reads.threads
integer
default:"2"
CPU threads.
remove_mito_reads.resources.mem_mb
integer
default:"2000"
Memory limit (MB).
remove_mito_reads.resources.time
integer
default:"30"
Time limit (minutes).
Purpose: Indexes the mitochondrial-free sorted BAM.Input: results/post_alignment/remove_mito_reads/Output: results/post_alignment/samtools_index/
samtools_index.threads
integer
default:"2"
CPU threads.
samtools_index.resources.mem_mb
integer
default:"2000"
Memory limit (MB).
samtools_index.resources.time
integer
default:"30"
Time limit (minutes).
Purpose: Filters reads by MAPQ and SAM flags, retaining only high-quality properly paired alignments.Input: results/post_alignment/remove_mito_reads/Output: results/post_alignment/samtools_view/
samtools_view.params.MAPQ
integer
default:"30"
Minimum mapping quality score. Reads below this threshold are discarded.
samtools_view.params.flags
integer
default:"3844"
Bitwise SAM flag filter (-F). The default 3844 excludes unmapped, mate-unmapped, not-primary, failing QC, and supplementary alignments.
samtools_view.threads
integer
default:"2"
CPU threads.
samtools_view.resources.mem_mb
integer
default:"2000"
Memory limit (MB).
samtools_view.resources.time
integer
default:"30"
Time limit (minutes).
Purpose: Removes reads overlapping ENCODE blacklist regions using bedtools intersect.Input: results/post_alignment/samtools_view/ + global.references.blacklistOutput: results/post_alignment/samtools_view/ (clean BAM replaces the filtered BAM in-place by convention)
remove_blacklist_reads.threads
integer
default:"2"
CPU threads.
remove_blacklist_reads.resources.mem_mb
integer
default:"4000"
Memory limit (MB).
remove_blacklist_reads.resources.time
integer
default:"30"
Time limit (minutes).
Purpose: Indexes the blacklist-filtered BAM. The .bai index file is placed alongside the BAM in the same directory.Input / Output: results/post_alignment/samtools_view/
samtools_index_post_filter.threads
integer
default:"2"
CPU threads.
samtools_index_post_filter.resources.mem_mb
integer
default:"2000"
Memory limit (MB).
samtools_index_post_filter.resources.time
integer
default:"30"
Time limit (minutes).
Purpose: Applies the Tn5 transposase insertion bias shift (+4 bp on the forward strand, −5 bp on the reverse strand) using alignmentSieve.Input: results/post_alignment/samtools_view/Output: results/post_alignment/tn5_shift/ (shifted BAM + index)
tn5_shift.threads
integer
default:"4"
CPU threads.
tn5_shift.resources.mem_mb
integer
default:"4000"
Memory limit (MB).
tn5_shift.resources.time
integer
default:"60"
Time limit (minutes).
Purpose: Generates comprehensive alignment statistics used by the QC gate (mapping rate, duplicate rate).Input: results/post_alignment/remove_mito_reads/Output: results/post_alignment/samtools_stats/
samtools_stats.threads
integer
default:"2"
CPU threads.
samtools_stats.resources.mem_mb
integer
default:"2000"
Memory limit (MB).
samtools_stats.resources.time
integer
default:"30"
Time limit (minutes).

Stage 4 - Metrics & QC

Purpose: Analyses nucleosomal banding patterns from Picard insert-size metrics (sub-nucleosomal <200 bp, mono-nucleosomal 200-400 bp, di-nucleosomal >400 bp).Input: results/metrics_qc/picard/CollectInsertSizeMetrics/Output: results/metrics_qc/fragment_size_analysis/
fragment_size_analysis.params.min_length
integer
default:"30"
Minimum fragment length to include in the analysis.
fragment_size_analysis.params.max_length
integer
default:"1000"
Maximum fragment length to include in the analysis.
fragment_size_analysis.params.max_fragment
integer
default:"1000"
Upper x-axis limit for fragment size histogram plots.
fragment_size_analysis.threads
integer
default:"4"
CPU threads.
fragment_size_analysis.resources.mem_mb
integer
default:"4000"
Memory limit (MB).
fragment_size_analysis.resources.time
integer
default:"60"
Time limit (minutes).
Purpose: Calculates TSS enrichment score by computing normalised signal across ±2 kb windows around annotated transcription start sites.Input: Tn5-shifted BAM + index (results/post_alignment/tn5_shift/)Output: results/metrics_qc/tss_enrichment/Script: rules/scripts/tss_enrichment.R
tss_enrichment.params.annotation
string
default:"*ANNOTATION_GTF"
Gene annotation GTF, resolved from global.references.annotation_gtf.
tss_enrichment.params.upstream
integer
default:"2000"
Base pairs upstream of the TSS to include in the enrichment window.
tss_enrichment.params.downstream
integer
default:"2000"
Base pairs downstream of the TSS to include in the enrichment window.
tss_enrichment.threads
integer
default:"4"
CPU threads.
tss_enrichment.resources.mem_mb
integer
default:"4000"
Memory limit (MB).
tss_enrichment.resources.time
integer
default:"60"
Time limit (minutes).
Purpose: Collects alignment summary metrics (total reads, mapped rate, strand balance) from the deduplicated BAM.Input: results/post_alignment/samtools_markdup/Output: results/metrics_qc/picard/CollectAlignmentSummaryMetrics/
picard.alignment_metrics.params.reference_genome
string
default:"*GENOME_FA"
Reference FASTA for validation, resolved from global.references.genome_fa.
picard.alignment_metrics.params.validation_stringency
string
default:"LENIENT"
Picard validation stringency. LENIENT suppresses non-critical warnings from legacy BAM files.
picard.alignment_metrics.threads
integer
default:"4"
CPU threads.
picard.alignment_metrics.resources.mem_mb
integer
default:"4000"
Memory limit (MB).
picard.alignment_metrics.resources.time
integer
default:"60"
Time limit (minutes).
Purpose: Collects insert-size distribution metrics used for fragment size analysis and nucleosome banding QC.Input: results/post_alignment/samtools_markdup/Output (metrics + histogram): results/metrics_qc/picard/CollectInsertSizeMetrics/
picard.insert_metrics.params.M
float
default:"0.05"
Minimum fraction of reads required for a histogram data point to be plotted.
picard.insert_metrics.params.validation_stringency
string
default:"LENIENT"
Picard validation stringency.
picard.insert_metrics.threads
integer
default:"4"
CPU threads.
picard.insert_metrics.resources.mem_mb
integer
default:"4000"
Memory limit (MB).
picard.insert_metrics.resources.time
integer
default:"60"
Time limit (minutes).
Purpose: Computes NSC (Normalized Strand Cross-correlation) and RSC (Relative Strand Cross-correlation) coefficients via phantompeakqualtools. ENCODE-compliant strand-shift QC.Input: results/post_alignment/samtools_view/Output: results/metrics_qc/cross_correlation/
cross_correlation.params.num_threads
integer
default:"4"
Threads passed internally to the R-based cross-correlation tool.
cross_correlation.params.max_range
integer
default:"150"
Maximum strand-shift range (bp) to evaluate for the cross-correlation curve.
cross_correlation.threads
integer
default:"4"
CPU threads.
cross_correlation.resources.mem_mb
integer
default:"8000"
Memory limit (MB).
cross_correlation.resources.time
integer
default:"120"
Time limit (minutes).
Purpose: Estimates library complexity and predicts yield at higher sequencing depths using a curve extrapolation model.Input: results/post_alignment/remove_mito_reads/Output: results/reporting_qc/preseq/
preseq.threads
integer
default:"1"
CPU threads (single-threaded tool).
preseq.resources.mem_mb
integer
default:"4000"
Memory limit (MB).
preseq.resources.time
integer
default:"60"
Time limit (minutes).
Purpose: Generates comprehensive BAM QC metrics including coverage uniformity, GC content, and read distribution.Input: results/post_alignment/samtools_markdup/Output: results/reporting_qc/qualimap/
qualimap_bamqc.threads
integer
default:"4"
CPU threads.
qualimap_bamqc.resources.mem_mb
integer
default:"4000"
Memory limit (MB).
qualimap_bamqc.resources.time
integer
default:"60"
Time limit (minutes).

QC Gate

Purpose: Automated biological checkpoint. Evaluates four gated metrics - FRiP, TSS enrichment, mapping rate, and duplicate rate - and writes per-sample pass/fail files that downstream rules depend on.Inputs:
  • FRiP file: results/peak_calling/frip_calculation/
  • TSS enrichment file: results/metrics_qc/tss_enrichment/
  • Samtools stats: results/post_alignment/samtools_stats/
Output: results/qc_gate/
qc_gate.params.min_frip
float
default:"0.2"
Minimum FRiP (Fraction of Reads in Peaks). ENCODE minimum recommendation.
qc_gate.params.min_tss_enr
float
default:"7.0"
Minimum TSS Enrichment score. Signal at TSSes relative to background.
qc_gate.params.min_mapping_rate
float
default:"80.0"
Minimum percentage of properly paired reads (from samtools stats).
qc_gate.params.max_duplicate_rate
float
default:"20.0"
Maximum duplicate rate percentage - (reads_duplicated / total_reads) × 100.
qc_gate.threads
integer
default:"1"
CPU threads.
qc_gate.resources.mem_mb
integer
default:"1000"
Memory limit (MB).
qc_gate.resources.time
integer
default:"10"
Time limit (minutes).

Stage 5 - Visualization

Purpose: Converts the Tn5-shifted BAM to a raw bedGraph coverage track.Input: results/post_alignment/tn5_shift/Output: results/visualization/bedtools_genomecov/
bedtools_genomecov.params.extra
string
default:"-bg"
Extra flags passed to bedtools genomecov. -bg produces bedGraph format.
bedtools_genomecov.threads
integer
default:"4"
CPU threads.
bedtools_genomecov.resources.mem_mb
integer
default:"4000"
Memory limit (MB).
bedtools_genomecov.resources.time
integer
default:"60"
Time limit (minutes).
Purpose: Sorts the bedGraph file by chromosome and coordinate (required by bedGraphToBigWig).Input: results/visualization/bedtools_genomecov/Output: results/visualization/sorted_bedgraph_file/
sorted_bedgraph.threads
integer
default:"4"
CPU threads.
sorted_bedgraph.resources.mem_mb
integer
default:"4000"
Memory limit (MB).
sorted_bedgraph.resources.time
integer
default:"60"
Time limit (minutes).
Purpose: Converts the sorted bedGraph to a binary BigWig file for genome browser visualisation.Input: results/visualization/sorted_bedgraph_file/Output: results/visualization/bigwig/
bigwig.params.genome
string
default:"*GENOME_SIZES"
Chromosome sizes file, resolved from global.references.genome_sizes.
bigwig.threads
integer
default:"4"
CPU threads.
bigwig.resources.mem_mb
integer
default:"4000"
Memory limit (MB).
bigwig.resources.time
integer
default:"60"
Time limit (minutes).
Purpose: Generates CPM-normalised BigWig tracks for cross-sample comparability using bamCoverage.Input: results/post_alignment/tn5_shift/Output: results/visualization/normalized_coverage/
normalized_coverage.params.method
string
default:"CPM"
Normalisation method passed to bamCoverage. Options: CPM, RPKM, BPM, RPGC, None.
normalized_coverage.threads
integer
default:"4"
CPU threads.
normalized_coverage.resources.mem_mb
integer
default:"8000"
Memory limit (MB).
normalized_coverage.resources.time
integer
default:"120"
Time limit (minutes).
Purpose: Computes inter-sample Pearson/Spearman correlations across BigWig tracks using deepTools multiBigwigSummary + plotCorrelation.Input: results/visualization/bigwig/Output: results/visualization/correlation_analysis/
correlation_analysis.params.bin_size
integer
default:"1000"
Genomic bin size in base pairs used for signal summarisation before correlation.
correlation_analysis.threads
integer
default:"4"
CPU threads.
correlation_analysis.resources.mem_mb
integer
default:"8000"
Memory limit (MB).
correlation_analysis.resources.time
integer
default:"120"
Time limit (minutes).
Purpose: Generates read-density heatmaps centred on filtered peak regions using deepTools computeMatrix + plotHeatmap.Inputs:
  • results/peak_calling/filtered_peaks/
  • results/visualization/bigwig/
Outputs:
  • Plot: results/visualization/heatmap/plot/
  • Matrix: results/visualization/heatmap/matrix/
  • Regions: results/visualization/heatmap/
heatmap.params.color
string
default:"coolwarm"
Colormap name for the heatmap. Any matplotlib colormap is accepted.
heatmap.params.upstream
integer
default:"3000"
Base pairs upstream of peak centre to include.
heatmap.params.downstream
integer
default:"3000"
Base pairs downstream of peak centre to include.
heatmap.threads
integer
default:"8"
CPU threads.
heatmap.resources.mem_mb
integer
default:"16000"
Memory limit (MB).
heatmap.resources.time
integer
default:"240"
Time limit (minutes).

Stage 6 - Peak Calling

Purpose: Calls chromatin accessibility peaks from the Tn5-shifted BAM in paired-end BAM mode.Input: results/post_alignment/tn5_shift/Output: results/peak_calling/macs2_peakcall/
macs2.params.genome_size
string
default:"hs"
Effective genome size. Use "hs" (human), "mm" (mouse), "ce" (C. elegans), "dm" (Drosophila), or a numeric value.
macs2.params.qvalue
float
default:"0.01"
Minimum q-value (FDR) threshold for peak calling.
macs2.params.nomodel
string
default:"--nomodel"
Disables MACS2’s shift model estimation. Required for ATAC-seq data.
macs2.params.format
string
default:"BAMPE"
Input format flag. BAMPE instructs MACS2 to use both mates of a paired-end BAM.
macs2.threads
integer
default:"8"
CPU threads.
macs2.resources.mem_mb
integer
default:"16000"
Memory limit (MB).
macs2.resources.time
integer
default:"240"
Time limit (minutes).
Purpose: Removes peaks overlapping ENCODE blacklist regions from the MACS2 narrowPeak output.Input: results/peak_calling/macs2_peakcall/Output: results/peak_calling/filtered_peaks/
blacklist_filter.params.blacklist
string
default:"*BLACKLIST"
Blacklist BED file, resolved from global.references.blacklist.
blacklist_filter.threads
integer
default:"4"
CPU threads.
blacklist_filter.resources.mem_mb
integer
default:"4000"
Memory limit (MB).
blacklist_filter.resources.time
integer
default:"60"
Time limit (minutes).
Purpose: Computes the FRiP score - fraction of total reads overlapping filtered peak regions.Inputs:
  • Filtered peaks: results/peak_calling/filtered_peaks/
  • Shifted BAM: results/post_alignment/tn5_shift/
Output: results/peak_calling/frip_calculation/
frip_calculation.threads
integer
default:"4"
CPU threads.
frip_calculation.resources.mem_mb
integer
default:"2000"
Memory limit (MB).
frip_calculation.resources.time
integer
default:"15"
Time limit (minutes).
Purpose: Annotates filtered peaks with genomic features (promoter, intron, exon, intergenic) using HOMER or ChIPseeker.Input: results/peak_calling/filtered_peaks/Output: results/peak_calling/peak_annotation/
peak_annotation.params.gff
string
default:"*ANNOTATION_GTF"
Annotation GTF resolved from global.references.annotation_gtf.
peak_annotation.params.genome
string
default:"*GENOME_FA"
Reference FASTA resolved from global.references.genome_fa.
peak_annotation.threads
integer
default:"8"
CPU threads.
peak_annotation.resources.mem_mb
integer
default:"16000"
Memory limit (MB).
peak_annotation.resources.time
integer
default:"240"
Time limit (minutes).
Purpose: Performs de novo and known motif enrichment analysis in filtered peaks using HOMER.Input: results/peak_calling/filtered_peaks/Output: results/peak_calling/motif_analysis/
motif_analysis.params.motif_db
string
default:"*MOTIF_DB"
MEME-format motif database, resolved from global.references.motif_db.
motif_analysis.params.genome_assembly
string
default:"*GENOME_FA"
Reference genome FASTA or assembly name.
motif_analysis.threads
integer
default:"8"
CPU threads.
motif_analysis.resources.mem_mb
integer
default:"16000"
Memory limit (MB).
motif_analysis.resources.time
integer
default:"240"
Time limit (minutes).
Purpose: Computes Irreproducible Discovery Rate between biological replicates to assess peak reproducibility.Outputs:
  • IDR peaks: results/peak_calling/idr/idr_peaks/
  • Optimal peaks: results/peak_calling/idr/optimal_peaks/
  • Plots: results/peak_calling/idr/plots/
idr.params.idr_threshold
float
default:"0.05"
IDR significance threshold. Peaks below this value are considered reproducible.
idr.params.rank_column
string
default:"score"
Column used to rank peaks before IDR analysis.
idr.threads
integer
default:"4"
CPU threads.
idr.resources.mem_mb
integer
default:"4000"
Memory limit (MB).
idr.resources.time
integer
default:"60"
Time limit (minutes).
Purpose: Merges peaks across all samples into a single non-redundant consensus peak set.Outputs:
  • Consensus BED: results/peak_calling/consensus_peaks/
  • Sample count matrix: results/peak_calling/consensus_peaks/
consensus_peaks.params.min_samples
integer
default:"2"
Minimum number of samples in which a peak must be called to be retained in the consensus set.
consensus_peaks.params.merge_distance
integer
default:"100"
Maximum distance in base pairs between peaks to be merged into one consensus region.
consensus_peaks.threads
integer
default:"4"
CPU threads.
consensus_peaks.resources.mem_mb
integer
default:"8000"
Memory limit (MB).
consensus_peaks.resources.time
integer
default:"120"
Time limit (minutes).
Purpose: Counts reads in consensus peak regions across all samples to build a count matrix for DESeq2 differential accessibility analysis.Input: results/post_alignment/tn5_shift/Output: results/peak_calling/count_peaks/
count_peaks.threads
integer
default:"2"
CPU threads.
count_peaks.resources.mem_mb
integer
default:"2000"
Memory limit (MB).
count_peaks.resources.time
integer
default:"30"
Time limit (minutes).
Purpose: DESeq2-based differential chromatin accessibility analysis between conditions, with volcano, MA, and PCA plots.Outputs:
  • Results table: results/peak_calling/differential_accessibility/
  • Plots: results/peak_calling/differential_accessibility/plots/
differential_accessibility.params.fdr_threshold
float
default:"0.05"
Adjusted p-value (FDR) threshold for significance.
differential_accessibility.params.log2fc_threshold
float
default:"1.0"
Minimum absolute log₂ fold-change for a peak to be considered differentially accessible.
differential_accessibility.threads
integer
default:"8"
CPU threads.
differential_accessibility.resources.mem_mb
integer
default:"16000"
Memory limit (MB).
differential_accessibility.resources.time
integer
default:"240"
Time limit (minutes).
Purpose: Full TOBIAS pipeline (ATACorrect → ScoreBigwig → BINDetect) for bias-corrected transcription factor footprinting and differential TF binding analysis.Input: results/post_alignment/samtools_view/Outputs:
  • Bias-corrected BigWig: results/peak_calling/tobias/corrected_bw/
  • Footprint scores: results/peak_calling/tobias/footprint_bw/
  • BINDetect results: results/peak_calling/tobias/bindetect/
tobias.params.genome_fa
string
default:"*GENOME_FA"
Reference FASTA.
tobias.params.genome_sizes
string
default:"*GENOME_SIZES"
Chromosome sizes file.
tobias.params.blacklist
string
default:"*BLACKLIST"
Blacklist BED file.
tobias.params.motif_db
string
default:"*MOTIF_DB"
MEME motif database.
tobias.params.conditions
string
default:"condition"
Column name from the sample sheet used to group samples for comparative BINDetect analysis.
tobias.threads
integer
default:"8"
CPU threads.
tobias.resources.mem_mb
integer
default:"16000"
Memory limit (MB).
tobias.resources.time
integer
default:"240"
Time limit (minutes).
Purpose: HINT-ATAC footprinting via the RGT toolkit for detection of TF-bound regulatory elements.Input: results/post_alignment/samtools_view/Outputs:
  • Footprint BED: results/peak_calling/footprinting/footprints/
  • Plots: results/peak_calling/footprinting/plots/
footprinting.params.organism
string
default:"hg38"
Organism genome assembly identifier (e.g., hg38, mm10).
footprinting.threads
integer
default:"8"
CPU threads.
footprinting.resources.mem_mb
integer
default:"16000"
Memory limit (MB).
footprinting.resources.time
integer
default:"240"
Time limit (minutes).
Purpose: chromVAR motif accessibility deviation analysis - computes per-cell or per-sample TF activity scores from the shifted BAM.Input: results/post_alignment/tn5_shift/Outputs:
  • Deviation scores: results/peak_calling/chromvar/deviations/
  • Bias-corrected scores: results/peak_calling/chromvar/bias_corrected/
  • Plots: results/peak_calling/chromvar/plots/
chromvar_analysis.params.motif_db
string
default:"*MOTIF_DB"
MEME motif database.
chromvar_analysis.params.genome_fa
string
default:"*GENOME_FA"
Reference FASTA.
chromvar_analysis.params.genome_sizes
string
default:"*GENOME_SIZES"
Chromosome sizes file.
chromvar_analysis.threads
integer
default:"8"
CPU threads.
chromvar_analysis.resources.mem_mb
integer
default:"16000"
Memory limit (MB).
chromvar_analysis.resources.time
integer
default:"240"
Time limit (minutes).

scATAC-seq - ArchR & Cicero

Purpose: Single-cell ATAC-seq analysis using ArchR: Arrow file creation, doublet detection and filtering, iterative LSI dimensionality reduction, UMAP clustering, and marker gene identification.Input: results/alignment/chromap/Outputs:
  • Arrow files: results/scatac/archr/arrow/
  • Filtered Arrow files: results/scatac/archr/filtered_arrow/
  • Fragments: results/scatac/archr/fragments/
  • Clusters: results/scatac/archr/clusters/
  • Markers: results/scatac/archr/markers/
  • Doublets report: results/scatac/archr/doublets/
  • Plots: results/scatac/archr/plots/
  • QC report: results/scatac/archr/qc_report/
archr.params.min_tss
float
default:"4.0"
Minimum TSS enrichment score for a cell barcode to be retained.
archr.params.min_frags
integer
default:"1000"
Minimum fragment count per cell barcode.
archr.params.max_frags
integer
default:"100000"
Maximum fragment count per cell barcode (doublet ceiling).
archr.params.tsse_method
string
default:"ArchR"
TSS enrichment calculation method.
archr.params.doublet_threshold
float
default:"0.2"
Doublet enrichment score cutoff for doublet removal.
archr.params.clustering_resolution
float
default:"0.8"
Seurat-style Leiden clustering resolution. Higher values produce more clusters.
archr.params.dims_to_use
string
default:"1:30"
LSI dimensions to use for UMAP embedding and clustering.
archr.params.force_dim_reduction
boolean
default:"true"
Forces recalculation of dimensionality reduction even if cached results exist.
archr.threads
integer
default:"16"
CPU threads.
archr.resources.mem_mb
integer
default:"64000"
Memory limit (MB).
archr.resources.time
integer
default:"240"
Time limit (minutes).
Purpose: Chromatin co-accessibility analysis using Cicero: identifies co-accessible regulatory elements and calls Cis-Co-Accessibility Networks (CCANs).Inputs:
  • Filtered Arrow files: results/scatac/archr/filtered_arrow/
  • Cell clusters: results/scatac/archr/clusters/
Outputs:
  • Co-accessibility connections: results/scatac/cicero/connections/
  • CCANs: results/scatac/cicero/ccans/
  • Plots: results/scatac/cicero/plots/
cicero.params.window_size
integer
default:"500"
Window size in kb used to build KNN graphs for co-accessibility analysis.
cicero.params.distance_cutoff
integer
default:"250000"
Maximum distance (bp) between co-accessible peaks to be included in a connection.
cicero.threads
integer
default:"8"
CPU threads.
cicero.resources.mem_mb
integer
default:"32000"
Memory limit (MB).
cicero.resources.time
integer
default:"240"
Time limit (minutes).

Reporting

Purpose: Aggregates per-rule Snakemake benchmark files into a single TSV summary of wall-clock time, CPU time, and peak memory usage across all pipeline stages.Output: results/reporting/benchmark_summary.tsv
Purpose: Aggregates QC metrics from fastp, FastQC, Picard, samtools, preseq, Qualimap, and the QC gate into a single interactive HTML report.Output: results/reporting/multiqc/
multiqc.params.config
string
default:"rules/config/multiqc_config.yaml"
Path to the MultiQC configuration YAML file for custom module settings and report branding.
multiqc.threads
integer
default:"2"
CPU threads.
multiqc.resources.mem_mb
integer
default:"4000"
Memory limit (MB).
multiqc.resources.time
integer
default:"30"
Time limit (minutes).

Build docs developers (and LLMs) love