BDB-Genomics ATAC-seq Pipeline

The BDB-Genomics ATAC-seq Pipeline is a config-driven Snakemake framework for end-to-end chromatin accessibility analysis. It handles both bulk and single-cell ATAC-seq from a single config.yaml, enforces rigorous QC before downstream processing, and scales from a 4 GB laptop to Kubernetes clusters—without modifying a single rule file.

Quickstart

Run your first bulk ATAC-seq analysis in minutes with synthetic or real ENCODE data.

Installation

Set up Snakemake, Conda environments, and Docker for any platform.

Configuration

Understand the single-source-of-truth config.yaml and how to override parameters at runtime.

Pipeline Stages

Explore all six pipeline stages from preprocessing through differential accessibility.

Modalities

Switch between bulk ATAC-seq and scATAC-seq with one environment variable.

Deployment

Deploy on Docker, SLURM, AWS, GCP, Azure, or Kubernetes with pre-built profiles.

Key Features

Dual-Modality Support

Run bulk ATAC-seq or single-cell ATAC-seq from the same pipeline by setting ATAC_MODE=bulk or ATAC_MODE=scatac.

Pre-flight Validation

validate_config.py catches missing keys, bad paths, and schema errors before Snakemake builds the DAG.

Four-Metric QC Gate

Samples are checked against FRiP, TSS enrichment, mapping rate, and duplicate rate thresholds before peak calling begins.

Eight Execution Profiles

Pre-built profiles for local, SLURM, low-resource, AWS, GCP, Azure, Kubernetes, and CI test environments.

TF Footprinting

Full TOBIAS BINDetect and HINT-ATAC pipelines for Tn5 bias-corrected transcription factor footprinting.

GEOAgent Integration

Convert GEO metadata CSVs directly into ready-to-run pipeline configs, with optional SRA download.

Pipeline at a Glance

The pipeline is organized as a six-stage directed acyclic graph (DAG):

Preprocessing

Quality trimming with fastp and per-sample FastQC reports. Reads failing length or quality thresholds are discarded before alignment.

Alignment

Bulk mode: Bowtie2 (--very-sensitive). scATAC mode: Chromap (--preset atac). Both produce coordinate-sorted BAM files.

Post-Alignment Filtering

Mitochondrial read removal, duplicate marking, MAPQ ≥ 30 filtering, ENCODE blacklist removal, and Tn5 shift correction.

Metrics & QC Gate

TSS enrichment, FRiP, fragment size analysis, Picard, Preseq, and Qualimap metrics feed a hard QC gate. Failing samples are bypassed—not crashed.

Peak Calling & Analysis

MACS2 peak calling, blacklist filtering, IDR replicate concordance, consensus peaks, DESeq2 differential accessibility, motif analysis, and TF footprinting.

Reporting

MultiQC aggregates all QC reports. A machine-readable pipeline_execution_summary.json records per-rule CPU time and peak memory.

All tool dependencies are resolved automatically via per-rule Conda environments or Singularity containers. You never need to install Bowtie2, MACS2, or R packages manually.

Get Started

Configuration

Pipeline Stages

Modalities

Deployment

Guides

Quickstart

Installation

Configuration

Pipeline Stages

Modalities

Deployment

Key Features

Dual-Modality Support

Pre-flight Validation

Four-Metric QC Gate

Eight Execution Profiles

TF Footprinting

GEOAgent Integration

Pipeline at a Glance

Build docs developers (and LLMs) love

Get Started

Configuration

Pipeline Stages

Modalities

Deployment

Guides

Documentation Index

Quickstart

Installation

Configuration

Pipeline Stages

Modalities

Deployment

​Key Features

Dual-Modality Support

Pre-flight Validation

Four-Metric QC Gate

Eight Execution Profiles

TF Footprinting

GEOAgent Integration

​Pipeline at a Glance

Build docs developers (and LLMs) love

Key Features

Pipeline at a Glance