The BDB-Genomics ATAC-seq Framework is a production-grade, config-driven Snakemake workflow for end-to-end chromatin accessibility analysis. Starting from raw paired-end FASTQ files, it executes six tightly ordered stages — Preprocessing, Alignment, Post-Alignment filtering, Metrics & QC, Peak Calling, and Visualization/Reporting — and produces a fully auditable set of BAMs, peak files, differential-accessibility tables, TF-footprint BigWigs, and an aggregated MultiQC HTML report. The entire pipeline is driven by a singleDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/atacseq-pipeline/llms.txt
Use this file to discover all available pages before exploring further.
config.yaml file; no Snakemake rule ever needs to be modified for routine use. Both bulk and single-cell modalities are supported from the same codebase, switched entirely through an environment variable.
Pipeline Stage Map
| Stage | What it Does | Key Tools | Output Directory |
|---|---|---|---|
| Preprocessing | Adapter trimming, 5′-end trimming, QC reports | fastp, FastQC | results/preprocessing/ |
| Alignment | Map trimmed reads to reference genome | Bowtie2 (bulk) / Chromap (scATAC) | results/alignment/ |
| Post-Alignment | Dedup, MAPQ filter, blacklist removal, Tn5 shift | samtools, bedtools, deepTools | results/post_alignment/ |
| Metrics & QC | TSS enrichment, fragment sizes, QC gating | ATACseqQC (R), Picard, Preseq, Qualimap | results/metrics_qc/, results/qc_gate/ |
| Peak Calling | Peak calling, IDR, consensus peaks, DA, footprinting | MACS2, IDR, DESeq2, TOBIAS, HINT-ATAC, HOMER | results/peak_calling/ |
| Visualization | BigWigs, heatmaps, correlation, MultiQC report | deepTools, UCSC tools, MultiQC | results/visualization/, results/reporting/ |
Six-Stage DAG
Modality Switching (ATAC_MODE)
The pipeline determines which set of Snakemake rules to activate by reading the ATAC_MODE environment variable at startup. If the variable is not set, it falls back to config.yaml → global.mode (default: "bulk").
ValueError before any job is submitted:
- Bulk Mode
- scATAC Mode
| Stage | Tool |
|---|---|
| Alignment | Bowtie2 (--very-sensitive) |
| Deduplication | samtools markdup (mark only, configurable) |
| Peak Calling | MACS2 (BAMPE, --nomodel) |
| Differential | DESeq2 (FDR 0.05, log2FC 1.0) |
| Footprinting | HINT-ATAC (RGT) + TOBIAS BINDetect |
Lifecycle Hooks
The Snakefile registers three Snakemake lifecycle hooks that run automatically without any user intervention:onstart
Prints the active mode and detected sample count to stdout the moment Snakemake begins planning the DAG. No files are written.
onsuccess
Prints the path to the final MultiQC HTML report, then calls
aggregate_logs.py success to write a machine-readable execution summary to results/reporting/pipeline_execution_summary.json.Configuration & Profiles
All parameters live inconfig.yaml. You never need to touch a .smk rule file for routine analysis. Override individual keys on the command line without modifying the main config:
local
Up to 8 concurrent local jobs. Ideal for workstations.
slurm
Submits each rule as a SLURM job with per-rule memory/time limits.
low_resource
Caps all rules at 4 GB RAM. Designed for laptops.
test
Relaxed QC thresholds for synthetic CI datasets.
aws
AWS Batch + S3 + Tibanna executor.
kubernetes
Container-native K8s scaling for large cohorts.
Output Manifest
Stage Pages
Preprocessing
fastp trimming parameters and FastQC quality reporting.
Alignment
Bowtie2 bulk alignment and coordinate sorting.
Post-Alignment
Eight-step filtering chain from fixmate to Tn5 shift.
QC Gating
Four-metric gate with PASS/WARN/FAIL tiers.
Peak Calling
MACS2, IDR, DESeq2, TOBIAS, HOMER, and more.
Visualization
BigWigs, heatmaps, MultiQC, and benchmark summaries.