The BDB-Genomics ATAC-seq Pipeline is a config-driven Snakemake framework for end-to-end chromatin accessibility analysis. It handles both bulk and single-cell ATAC-seq from a singleDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/atacseq-pipeline/llms.txt
Use this file to discover all available pages before exploring further.
config.yaml, enforces rigorous QC before downstream processing, and scales from a 4 GB laptop to Kubernetes clusters—without modifying a single rule file.
Quickstart
Run your first bulk ATAC-seq analysis in minutes with synthetic or real ENCODE data.
Installation
Set up Snakemake, Conda environments, and Docker for any platform.
Configuration
Understand the single-source-of-truth config.yaml and how to override parameters at runtime.
Pipeline Stages
Explore all six pipeline stages from preprocessing through differential accessibility.
Modalities
Switch between bulk ATAC-seq and scATAC-seq with one environment variable.
Deployment
Deploy on Docker, SLURM, AWS, GCP, Azure, or Kubernetes with pre-built profiles.
Key Features
Dual-Modality Support
Run bulk ATAC-seq or single-cell ATAC-seq from the same pipeline by setting
ATAC_MODE=bulk or ATAC_MODE=scatac.Pre-flight Validation
validate_config.py catches missing keys, bad paths, and schema errors before Snakemake builds the DAG.Four-Metric QC Gate
Samples are checked against FRiP, TSS enrichment, mapping rate, and duplicate rate thresholds before peak calling begins.
Eight Execution Profiles
Pre-built profiles for local, SLURM, low-resource, AWS, GCP, Azure, Kubernetes, and CI test environments.
TF Footprinting
Full TOBIAS BINDetect and HINT-ATAC pipelines for Tn5 bias-corrected transcription factor footprinting.
GEOAgent Integration
Convert GEO metadata CSVs directly into ready-to-run pipeline configs, with optional SRA download.
Pipeline at a Glance
The pipeline is organized as a six-stage directed acyclic graph (DAG):Preprocessing
Quality trimming with fastp and per-sample FastQC reports. Reads failing length or quality thresholds are discarded before alignment.
Alignment
Bulk mode: Bowtie2 (
--very-sensitive). scATAC mode: Chromap (--preset atac). Both produce coordinate-sorted BAM files.Post-Alignment Filtering
Mitochondrial read removal, duplicate marking, MAPQ ≥ 30 filtering, ENCODE blacklist removal, and Tn5 shift correction.
Metrics & QC Gate
TSS enrichment, FRiP, fragment size analysis, Picard, Preseq, and Qualimap metrics feed a hard QC gate. Failing samples are bypassed—not crashed.
Peak Calling & Analysis
MACS2 peak calling, blacklist filtering, IDR replicate concordance, consensus peaks, DESeq2 differential accessibility, motif analysis, and TF footprinting.
All tool dependencies are resolved automatically via per-rule Conda environments or Singularity containers. You never need to install Bowtie2, MACS2, or R packages manually.