High-quality ATAC-seq data is non-negotiable before peak calling and differential-accessibility analysis — a sample with poor TSS enrichment or low fraction of reads in peaks will produce noisy, unreproducible results that corrupt every downstream conclusion. The BDB-Genomics pipeline therefore implements a hard QC gate that runs after the post-alignment filtering chain and before any peak-calling jobs are scheduled. The gate evaluates four numeric thresholds per sample, assigns each metric a three-tier status (PASS / WARN / FAIL), writes a human-readable text report and a machine-readable JSON, and signals to Snakemake via a sentinel file whether the sample may proceed. Critically, failing samples are not crashed — the pipeline documents the failure and gracefully bypasses expensive downstream rules to save compute time.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/atacseq-pipeline/llms.txt
Use this file to discover all available pages before exploring further.
The Four Thresholds
All thresholds are declared inconfig.yaml under qc_gate.params and can be overridden on the command line without modifying any source file:
| Metric | Config Key | Threshold | Direction |
|---|---|---|---|
| FRiP Score | min_frip | ≥ 0.2 | Higher is better |
| TSS Enrichment | min_tss_enr | ≥ 7.0 | Higher is better |
| Mapping Rate | min_mapping_rate | ≥ 80.0 % | Higher is better |
| Duplicate Rate | max_duplicate_rate | ≤ 20.0 % | Lower is better |
What parse_qc_metrics.py Does
The gate is implemented entirely in rules/scripts/parse_qc_metrics.py. The Snakemake rule calls it with all four input files and all four threshold values:
Parse FRiP
Reads
{sample}_frip.txt (produced by frip_calculation). Handles both single-column and TSV formats, and skips header lines that contain the word “sample”:Parse TSS Enrichment
Reads
{sample}_tss_enrichment.txt produced by the tss_enrichment.R script. Handles both headered and headerless TSV outputs from the ATACseqQC R package.Parse samtools stats
Reads Duplicate rate is derived as
{sample}_postFiltering.stats.txt. Uses a key-based mapping over SN-prefixed lines to extract sequences (total reads), percentage of properly paired reads (mapping rate), and reads duplicated:(duplicates / total_reads) × 100.PASS / WARN / FAIL Tiers
The script applies a 10% advisory buffer around each threshold. Metrics that pass the hard threshold but fall within 10% of it receive aWARN status instead of PASS, alerting operators to borderline samples before they become problems in a larger cohort.
| Status | Condition (FRiP example, threshold = 0.2) |
|---|---|
| PASS | FRiP ≥ 0.2 (meets or exceeds threshold) |
| WARN | FRiP ≥ 0.2 and FRiP < 0.18 — unreachable in practice (see note below) |
| FAIL | FRiP < 0.2 (below hard threshold) |
The
WARN branch uses an elif that is evaluated only when the hard-fail condition is false (i.e., val >= target). For >= metrics, warn_threshold = target × 0.9, so the WARN condition (val < 0.18) can never be true when val >= 0.2. In the current implementation, each metric resolves to either PASS or FAIL — the WARN path exists in the code as a guard for future threshold adjustments but is not reachable with the default logic. The overall field is PASSED only when all four metrics are individually PASS.Output Files
Two files are written per sample toresults/qc_gate/:
- {sample}_qc_pass.txt
- {sample}_qc_pass.json
A tab-separated sentinel file consumed by downstream Snakemake rules to decide whether to run expensive jobs:or
Graceful Fallback: Failing Samples Are Not Crashed
The pipeline is designed so that a failing QC gate never halts the entire run. The script exits with code0 even when the overall result is FAILED:
_qc_pass.txt sentinel in their input block. If the file records FAILED, the rule is still triggered (Snakemake demands the output of qc_gate), but the rule’s script reads the file and returns immediately with a dummy output rather than running full analysis.
Graceful Fallback: Zero-Peak Samples
A sample may pass the QC gate but yield zero peaks after blacklist filtering — for example, when the library covers a very restricted region or when a test dataset is used. Rather than crashing rules that expect non-empty BED files, every downstream rule (ChIPseeker annotation, heatmap, TOBIAS BINDetect, HOMER motif analysis) detects the empty input and generates valid dummy outputs:Example: HOMER motif analysis fallback
Example: HOMER motif analysis fallback
Example: ChIPseeker peak annotation fallback
Example: ChIPseeker peak annotation fallback
Tuning Thresholds
Adjusting for Your Data
Tissue-specific ATAC-seq libraries (e.g., FFPE material, frozen biopsies) may have legitimately lower TSS enrichment. To relax a threshold, editconfig.yaml:
Disabling the Gate Temporarily
For CI testing with synthetic data or when rapid iteration is needed, disable every threshold with a command-line override file:MultiQC Integration
The{sample}_qc_pass.json files are passed directly to multiqc as inputs. MultiQC’s custom-content module parses them and renders a QC-gate summary table inside the final multiqc_report.html. No custom MultiQC plugin is needed — the pipeline’s rules/config/multiqc_config.yaml configures the JSON schema parsing.