BDB ATAC-seq Pipeline: Version History and Changelog

This changelog records all notable changes to the BDB-Genomics ATAC-seq framework. Versions follow a MAJOR.MINOR.PATCH scheme — major versions introduce breaking changes or large new feature sets, minor versions add backward-compatible functionality, and patch versions fix bugs without altering the public interface. Each release section is expanded below with full Added, Changed, and Fixed detail.

V3.0.1 — 2026-05-22

Added

Global Wildcard Constraints: Added regex constraints for {sample} ([^/]+), {condition} ([^/]+), and {replicate} ([0-9]+) at the global level in the Snakefile to prevent ambiguous path matching and ensure robust DAG resolution.

Changed

calculate_mito_reads.smk: Refactored to include .bam.bai as an official Snakemake input dependency instead of executing an inline indexing command in the shell directive, preventing potential job race conditions.
bedtools_genomecov.smk: Modified to dynamically ignore bulk QC (qc_pass file) when the pipeline is switched to scatac mode, allowing scATAC-seq target coverage rules to execute seamlessly.

Fixed

tobias.smk: Corrected TOBIAS BINDetect argument from --bam to --signals to prevent runtime crashes during bias-corrected TF footprinting.
bedtools.yaml: Added missing samtools dependency to the 03_post_alignment/bedtools conda environment to resolve runtime failures in FRiP calculation.
archr.smk: Restored the missing galaxy project Singularity container directive for the archr_doublet_detection rule.
fastp.smk: Restored missing whitespace in threads: config["fastp"]["threads"] directive.
samtools_fixmate.smk: Normalized log and benchmark extensions (.err and .txt) to maintain consistency with the global framework logging convention.
Snakefile: Standardized CI block to define SAMPLES_TSV = None under IS_CI mode, avoiding a potential NameError on empty sample checks.

V3.0.0 — 2026-05-17

Added

CLI Mode Switching: ATAC_MODE=bulk or ATAC_MODE=scatac environment variable switches the entire pipeline between bulk and single-cell modes — no manual file editing required.
Chromap Alignment: Fast single-cell ATAC-seq aligner with --preset atac for barcode-aware alignment of scATAC-seq reads.
ArchR Pipeline: Arrow file creation, doublet detection and filtering, iterative LSI dimensionality reduction, UMAP clustering, and marker gene identification.
Cicero Co-accessibility: Chromatin co-accessibility network analysis, CCAN identification, and connection scoring.
scATAC-seq Conda Environments: Dedicated chromap, archr, and cicero conda environments with all required dependencies.
scATAC-seq Config Blocks: New chromap, archr, and cicero configuration sections added to config.yaml.
global.mode: New top-level config key ("bulk" or "scatac") for declarative modality selection without environment variable.

Changed

Snakefile: Conditional rule includes based on the MODE variable — bulk and scATAC-seq rule sets are now mutually exclusive, preventing DAG conflicts.
README: Complete rewrite of the scATAC-seq section — now a single command switch instead of requiring manual file edits.
Comparison Table: Added scATAC-seq, Cicero, and mode switching rows to the feature comparison table.

V2.1.0 — 2026-05-17

Added

TOBIAS Footprinting Suite: Full TOBIAS pipeline (ATACorrect → ScoreBigwig → BINDetect) for bias-corrected transcription factor footprinting and differential TF binding analysis across conditions.
Low-Resource Profile: profile/low_resource/ Snakemake profile for machines with ≤ 8 GB RAM and ≤ 4 CPU cores.
Sequential Sample Batching: run_batched.py script for ultra-low-resource machines (≤ 4 GB RAM) that processes samples one or a few at a time to avoid out-of-memory failures.

V2.0.0 — 2026-05-17

Added

IDR Replicate Concordance: Irreproducible Discovery Rate analysis for validating peak reproducibility between biological replicates.
NSC/RSC Cross-Correlation: ENCODE-compliant strand cross-correlation metrics via phantompeakqualtools.
Consensus Peak Calling: Multi-sample peak merging with a configurable minimum sample threshold (min_samples: 2) and merge distance (merge_distance: 100 bp).
Differential Accessibility: DESeq2-based differential chromatin accessibility analysis with volcano plots, MA plots, PCA plots, and heatmaps.
Peak Count Matrix: bedtools-based read counting in consensus peaks for all samples, producing the count matrix required by DESeq2.
Benchmark Aggregation: Multi-rule performance summary aggregating wall-clock time, CPU time, and peak memory across all pipeline stages.
Test Profile: profile/test/ Snakemake profile for CI validation with auto-generated minimal test data.
Test Data Generator: generate_test_data.py creates minimal FASTQ, reference genome, and annotation files for integration testing.
CI/CD Pipeline: Two-stage GitHub Actions workflow (lint + test) with micromamba environment setup and artifact upload.

Changed

QC Gate Enforcement: Downstream rules (macs2, bedtools_genomecov, heatmap, peak_annotation, normalize_coverage) now declare {sample}_qc_pass.txt as a required input, enforcing the QC gate dependency throughout the DAG.
motif_analysis: Refactored to per-sample execution using the HOMER assembly name instead of a FASTA path.
cross_correlation: Promoted from optional to standard ENCODE-compliant QC metric included in every bulk run.
README: Complete rewrite with a feature comparison table covering all pipeline stages and analysis modes.
Version: Bumped to V2.0.0 for the production-grade feature set.

Fixed

fastp.yaml: Invalid version 1.3.3 corrected to 0.24.0.
tss_enrichment.yaml: Added 7 missing Bioconductor packages required by the TSS enrichment R script.
fragment_size_analysis.smk: Referenced the wrong conda environment (samtools → fragment_analysis with R).
frip_calculation.smk: Removed chromosome prefix normalisation (sed 's/^chr//g') that caused mismatches with MACS2 peak chromosome names.
preseq.smk: Removed || true failure silencing that was masking real preseq errors.
samtools_sort.smk: Moved log redirection to a separate line for correct shell behaviour.
samtools_fixmate.smk: Added set -o pipefail to catch errors in piped commands.
bowtie2.smk: Replaced hardcoded --very-sensitive flag with the config["bowtie2"]["params"]["sensitive"] config reference.
blacklist_filter.smk: Removed fragile awk chromosome-prefix normalisation logic.
remove_mito_reads.smk: Switched from regex matching to exact chromosome name matching to prevent accidental exclusion of non-mitochondrial contigs.
tss_enrichment.R: Removed leftover DEBUG print statement.
validate_config.py: Corrected "ChIP-seq" to "ATAC-seq" in the module docstring.
.gitignore: Removed invalid .../ directory glob syntax.
bedtools.yaml / samtools.yaml: Added missing bc dependency for shell arithmetic in FRiP calculation.
profile/slurm/config.yaml: Replaced placeholder SLURM account name, added latency-wait for NFS-mounted shared filesystems.
.github/workflows/lint.yml: Fixed YAML indentation errors, added micromamba setup step, pinned pulp version to resolve dependency conflict.

V1.1.0 — 2026-05-15

Added

Production-Ready Architecture: Implemented a fully reactive and modular Snakemake framework with deterministic DAG resolution.
Dynamic Configuration: Migrated all target paths in the Snakefile to dynamic config.yaml references, ensuring complete portability across compute environments without hard-coded paths.
QC Gating: Integrated a biological checkpoint system to validate TSS enrichment and FRiP scores before expensive downstream analysis stages are triggered.
Lifecycle Hooks: Added onstart, onsuccess, and onerror Snakemake handlers for automated status reporting and JSON summary generation.
Proactive Validation: Integrated validate_config.py invocation at DAG-build time to surface schema errors before any jobs are submitted.

Core Processes included in first release:

Preprocessing: fastp, FastQC
Alignment: Bowtie2, samtools sorting, indexing, and deduplication
Post-Alignment QC: Mitochondrial read quantification, fragment size analysis, TSS enrichment, phantompeakqualtools, Preseq, Qualimap
Coverage and normalisation: Genome coverage, BigWig conversion, CPM normalisation
Peak calling and filtering: MACS2, ENCODE blacklist filtering
Visualisation: Heatmaps, motif analysis, correlation plots

Changed

Standardized Directives: Enforced a uniform 10-directive layout across all 34 .smk rule files (rule, input, output, params, log, benchmark, conda, container, threads, resources, shell).
Global Containerization: Switched all rules to use stable Singularity containers via Biocontainers for 100 % reproducibility across compute environments.
Environment Hierarchy: Refactored rules/envs/ into a stage-based hierarchical directory structure matching the six pipeline stages.
Cleaned Root Directory: Removed legacy scripts, runtime artefacts, and unused directories (benchmarks/, scratch/, scripts/) from the repository root.

Fixed

Resolved redundant and missing include: statements in the main Snakefile.
Corrected motif_analysis output directory resolution issues causing downstream rules to fail.
Standardized log and benchmark file paths across the entire framework for consistent --report generation.

Configuration Reference

Scripts

Changelog

BDB ATAC-seq Pipeline: Version History and Changelog

Added

Changed

Fixed

Added

Changed

Added

Added

Changed

Fixed

Added

Changed

Fixed

Build docs developers (and LLMs) love

Configuration Reference

Scripts

Changelog

Documentation Index

​Added

​Changed

​Fixed

​Added

​Changed

​Added

​Added

​Changed

​Fixed

​Added

​Changed

​Fixed

Build docs developers (and LLMs) love

Added

Changed

Fixed

Added

Changed

Added

Added

Changed

Fixed

Added

Changed

Fixed