Peak calling is the most analytically dense stage of the pipeline. Starting from the Tn5-shifted, QC-gated BAM files, the pipeline calls peaks with MACS2, filters them against the ENCODE blacklist, computes the Fraction of Reads in Peaks, identifies reproducible peaks across replicates with IDR, merges all samples into a consensus peak set, quantifies accessibility with a count matrix, runs DESeq2 for differential accessibility, annotates peaks with ChIPseeker, scans for enriched motifs with HOMER, and performs transcription-factor footprinting with both TOBIAS and HINT-ATAC. Each tool is an independent Snakemake rule and produces outputs in well-defined directories underDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/atacseq-pipeline/llms.txt
Use this file to discover all available pages before exploring further.
results/peak_calling/.
MACS2 Peak Calling
MACS2 operates in paired-end BAMPE mode, which uses actual fragment size information rather than estimating it from a shift model — hence--nomodel. The genome size is computed at parse time by summing the chromosome lengths from genome.chrom.sizes, so no manual genome-size string is needed:
results/peak_calling/macs2_peakcall/{sample}_peaks.narrowPeak
BAMPE format instructs MACS2 to use the full paired-end fragment rather than shifting single-end reads. This is the correct mode for paired-end ATAC-seq and produces more accurate peak summit positions.
Blacklist Filtering
Raw MACS2 peaks may overlap ENCODE blacklist regions.bedtools intersect -v removes any peak that overlaps the blacklist:
results/peak_calling/filtered_peaks/{sample}_filtered_peaks.bed
FRiP Calculation
The Fraction of Reads in Peaks measures library enrichment: a well-prepared ATAC-seq library should have FRiP ≥ 0.2. The rule usessamtools view -c -f 64 (count R1 reads) for total fragments and bedtools coverage -counts for fragments overlapping peaks:
results/peak_calling/frip_calculation/{sample}_frip.txt
The FRiP value is later consumed by parse_qc_metrics.py at the QC gate.
IDR — Irreproducible Discovery Rate
IDR quantifies replicate concordance by comparing peak rank lists between pairs of replicates within each condition. Replicate pairs are auto-computed from the sample sheet at pipeline startup usingitertools.combinations:
{condition}_rep{N}_rep{M}_idr_peaks.bed— peaks passing IDR threshold{condition}_rep{N}_rep{M}_optimal_peaks.bed— optimal peak set{condition}_rep{N}_rep{M}_idr_plot.png— IDR score scatter plot
Consensus Peaks
The consensus peaks rule merges filtered peaks from all samples, requiring a peak to be present in at leastmin_samples samples and merging overlapping peaks within merge_distance bp:
bedtools merge -d 100, intersect each sample’s peaks against the merged set to count per-sample support, then retain only peaks with support in ≥ min_samples samples.
Outputs:
results/peak_calling/consensus_peaks/consensus_peaks.bed— final consensus setresults/peak_calling/consensus_peaks/peak_sample_counts.txt— per-peak sample counts
Count Matrix (DESeq2 Input)
bedtools coverage is run per sample against the consensus peak set to produce per-peak read counts. These TSVs form the count matrix fed into DESeq2:
results/peak_calling/count_peaks/{sample}_peak_counts.tsv
Differential Accessibility (DESeq2)
Differential accessibility analysis uses DESeq2 (via an R script) on the count matrix, with the sample sheet providing condition labels. Peaks with FDR ≤ 0.05 and |log2FC| ≥ 1.0 are reported as significantly differentially accessible:diff_accessibility_results.tsv— full DESeq2 results tablevolcano_plot.pdf— log2FC vs −log10(FDR)ma_plot.pdf— log2FC vs mean normalized countspca_plot.pdf— sample-level PCA of normalized accessibilityheatmap.pdf— top differentially accessible peaks across samples
Peak Annotation (ChIPseeker)
ChIPseeker annotates each blacklist-filtered peak with its nearest genomic feature using a TxDb object built from the GTF annotation. Peaks within ±3 kb of a TSS are labeled “Promoter”:{sample}_peak_annotation.txt— per-peak annotation withdistanceToTSS{sample}_peak_annotation_summary.txt— genomic feature breakdown table
Motif Analysis (HOMER)
HOMERfindMotifsGenome.pl scans the filtered peak set for enriched known and de novo motifs. The pipeline searches motif lengths of 8, 10, and 12 bp in 200 bp windows:
results/peak_calling/motif_analysis/{sample}/ (directory with homerResults.html, knownResults.html)
TF Footprinting — TOBIAS
TOBIAS identifies transcription factor binding sites within open chromatin by correcting for Tn5 insertion bias and scoring nucleotide-level footprint depth. The pipeline runs three sequential TOBIAS rules:ATACorrect
Corrects Tn5 sequence bias across the genome. Inputs: filtered BAM, blacklist-filtered peaks, reference FASTA, and blacklist BED. Outputs a bias-corrected BigWig and a raw bias-track BigWig:Output:
results/peak_calling/tobias/corrected_bw/{sample}_corrected.bw
Output: results/peak_calling/tobias/bias_track/{sample}_bias.bwScoreBigwig
Computes per-nucleotide footprint scores from the bias-corrected BigWig and peak regions:Output:
results/peak_calling/tobias/footprint_bw/{sample}_footprints.bwTF Footprinting — HINT-ATAC (RGT)
HINT-ATAC provides an independent footprinting implementation using the Regulatory Genomics Toolbox (rgt-hint). It runs per-sample on the filtered BAM and blacklist-filtered peaks with the hg38 organism parameter:
results/peak_calling/footprinting/footprints/{sample}_footprints.bed
If the peak file is empty (zero peaks), both TOBIAS BINDetect and HINT-ATAC detect this condition and create empty placeholder outputs rather than failing the DAG. This mirrors the behavior described in the QC Gating page.
chromVAR Motif Accessibility
chromVAR computes per-sample motif deviation scores that capture cell-type-specific transcription factor accessibility without requiring peak-level differential testing:results/peak_calling/chromvar/deviations/{sample}_deviations.tsv
Output Summary
MACS2 Peaks
results/peak_calling/macs2_peakcall/{sample}_peaks.narrowPeakFiltered Peaks
results/peak_calling/filtered_peaks/{sample}_filtered_peaks.bedIDR Peaks
results/peak_calling/idr/idr_peaks/{cond}_rep{N}_rep{M}_idr_peaks.bedConsensus Peaks
results/peak_calling/consensus_peaks/consensus_peaks.bedDESeq2 Results
results/peak_calling/differential_accessibility/diff_accessibility_results.tsvTOBIAS BINDetect
results/peak_calling/tobias/bindetect/HINT-ATAC Footprints
results/peak_calling/footprinting/footprints/{sample}_footprints.bedHOMER Motifs
results/peak_calling/motif_analysis/{sample}/