This changelog records all notable changes to the BDB-Genomics ATAC-seq framework. Versions follow aDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/atacseq-pipeline/llms.txt
Use this file to discover all available pages before exploring further.
MAJOR.MINOR.PATCH scheme — major versions introduce breaking changes or large new feature sets, minor versions add backward-compatible functionality, and patch versions fix bugs without altering the public interface. Each release section is expanded below with full Added, Changed, and Fixed detail.
V3.0.1 — 2026-05-22
V3.0.1 — 2026-05-22
Added
- Global Wildcard Constraints: Added regex constraints for
{sample}([^/]+),{condition}([^/]+), and{replicate}([0-9]+) at the global level in theSnakefileto prevent ambiguous path matching and ensure robust DAG resolution.
Changed
- calculate_mito_reads.smk: Refactored to include
.bam.baias an official Snakemake input dependency instead of executing an inline indexing command in the shell directive, preventing potential job race conditions. - bedtools_genomecov.smk: Modified to dynamically ignore bulk QC (
qc_passfile) when the pipeline is switched toscatacmode, allowing scATAC-seq target coverage rules to execute seamlessly.
Fixed
- tobias.smk: Corrected TOBIAS BINDetect argument from
--bamto--signalsto prevent runtime crashes during bias-corrected TF footprinting. - bedtools.yaml: Added missing
samtoolsdependency to the03_post_alignment/bedtoolsconda environment to resolve runtime failures in FRiP calculation. - archr.smk: Restored the missing galaxy project Singularity
containerdirective for thearchr_doublet_detectionrule. - fastp.smk: Restored missing whitespace in
threads: config["fastp"]["threads"]directive. - samtools_fixmate.smk: Normalized log and benchmark extensions (
.errand.txt) to maintain consistency with the global framework logging convention. - Snakefile: Standardized CI block to define
SAMPLES_TSV = NoneunderIS_CImode, avoiding a potentialNameErroron empty sample checks.
V3.0.0 — 2026-05-17
V3.0.0 — 2026-05-17
Added
- CLI Mode Switching:
ATAC_MODE=bulkorATAC_MODE=scatacenvironment variable switches the entire pipeline between bulk and single-cell modes — no manual file editing required. - Chromap Alignment: Fast single-cell ATAC-seq aligner with
--preset atacfor barcode-aware alignment of scATAC-seq reads. - ArchR Pipeline: Arrow file creation, doublet detection and filtering, iterative LSI dimensionality reduction, UMAP clustering, and marker gene identification.
- Cicero Co-accessibility: Chromatin co-accessibility network analysis, CCAN identification, and connection scoring.
- scATAC-seq Conda Environments: Dedicated
chromap,archr, andciceroconda environments with all required dependencies. - scATAC-seq Config Blocks: New
chromap,archr, andciceroconfiguration sections added toconfig.yaml. - global.mode: New top-level config key (
"bulk"or"scatac") for declarative modality selection without environment variable.
Changed
- Snakefile: Conditional rule includes based on the
MODEvariable — bulk and scATAC-seq rule sets are now mutually exclusive, preventing DAG conflicts. - README: Complete rewrite of the scATAC-seq section — now a single command switch instead of requiring manual file edits.
- Comparison Table: Added scATAC-seq, Cicero, and mode switching rows to the feature comparison table.
V2.1.0 — 2026-05-17
V2.1.0 — 2026-05-17
Added
- TOBIAS Footprinting Suite: Full TOBIAS pipeline (ATACorrect → ScoreBigwig → BINDetect) for bias-corrected transcription factor footprinting and differential TF binding analysis across conditions.
- Low-Resource Profile:
profile/low_resource/Snakemake profile for machines with ≤ 8 GB RAM and ≤ 4 CPU cores. - Sequential Sample Batching:
run_batched.pyscript for ultra-low-resource machines (≤ 4 GB RAM) that processes samples one or a few at a time to avoid out-of-memory failures.
V2.0.0 — 2026-05-17
V2.0.0 — 2026-05-17
Added
- IDR Replicate Concordance: Irreproducible Discovery Rate analysis for validating peak reproducibility between biological replicates.
- NSC/RSC Cross-Correlation: ENCODE-compliant strand cross-correlation metrics via phantompeakqualtools.
- Consensus Peak Calling: Multi-sample peak merging with a configurable minimum sample threshold (
min_samples: 2) and merge distance (merge_distance: 100bp). - Differential Accessibility: DESeq2-based differential chromatin accessibility analysis with volcano plots, MA plots, PCA plots, and heatmaps.
- Peak Count Matrix:
bedtools-based read counting in consensus peaks for all samples, producing the count matrix required by DESeq2. - Benchmark Aggregation: Multi-rule performance summary aggregating wall-clock time, CPU time, and peak memory across all pipeline stages.
- Test Profile:
profile/test/Snakemake profile for CI validation with auto-generated minimal test data. - Test Data Generator:
generate_test_data.pycreates minimal FASTQ, reference genome, and annotation files for integration testing. - CI/CD Pipeline: Two-stage GitHub Actions workflow (lint + test) with micromamba environment setup and artifact upload.
Changed
- QC Gate Enforcement: Downstream rules (
macs2,bedtools_genomecov,heatmap,peak_annotation,normalize_coverage) now declare{sample}_qc_pass.txtas a required input, enforcing the QC gate dependency throughout the DAG. - motif_analysis: Refactored to per-sample execution using the HOMER assembly name instead of a FASTA path.
- cross_correlation: Promoted from optional to standard ENCODE-compliant QC metric included in every bulk run.
- README: Complete rewrite with a feature comparison table covering all pipeline stages and analysis modes.
- Version: Bumped to V2.0.0 for the production-grade feature set.
Fixed
fastp.yaml: Invalid version1.3.3corrected to0.24.0.tss_enrichment.yaml: Added 7 missing Bioconductor packages required by the TSS enrichment R script.fragment_size_analysis.smk: Referenced the wrong conda environment (samtools→fragment_analysiswith R).frip_calculation.smk: Removed chromosome prefix normalisation (sed 's/^chr//g') that caused mismatches with MACS2 peak chromosome names.preseq.smk: Removed|| truefailure silencing that was masking real preseq errors.samtools_sort.smk: Moved log redirection to a separate line for correct shell behaviour.samtools_fixmate.smk: Addedset -o pipefailto catch errors in piped commands.bowtie2.smk: Replaced hardcoded--very-sensitiveflag with theconfig["bowtie2"]["params"]["sensitive"]config reference.blacklist_filter.smk: Removed fragile awk chromosome-prefix normalisation logic.remove_mito_reads.smk: Switched from regex matching to exact chromosome name matching to prevent accidental exclusion of non-mitochondrial contigs.tss_enrichment.R: Removed leftoverDEBUGprint statement.validate_config.py: Corrected"ChIP-seq"to"ATAC-seq"in the module docstring..gitignore: Removed invalid.../directory glob syntax.bedtools.yaml/samtools.yaml: Added missingbcdependency for shell arithmetic in FRiP calculation.profile/slurm/config.yaml: Replaced placeholder SLURM account name, addedlatency-waitfor NFS-mounted shared filesystems..github/workflows/lint.yml: Fixed YAML indentation errors, added micromamba setup step, pinnedpulpversion to resolve dependency conflict.
V1.1.0 — 2026-05-15
V1.1.0 — 2026-05-15
Added
- Production-Ready Architecture: Implemented a fully reactive and modular Snakemake framework with deterministic DAG resolution.
- Dynamic Configuration: Migrated all target paths in the
Snakefileto dynamicconfig.yamlreferences, ensuring complete portability across compute environments without hard-coded paths. - QC Gating: Integrated a biological checkpoint system to validate TSS enrichment and FRiP scores before expensive downstream analysis stages are triggered.
- Lifecycle Hooks: Added
onstart,onsuccess, andonerrorSnakemake handlers for automated status reporting and JSON summary generation. - Proactive Validation: Integrated
validate_config.pyinvocation at DAG-build time to surface schema errors before any jobs are submitted.
- Preprocessing: fastp, FastQC
- Alignment: Bowtie2, samtools sorting, indexing, and deduplication
- Post-Alignment QC: Mitochondrial read quantification, fragment size analysis, TSS enrichment, phantompeakqualtools, Preseq, Qualimap
- Coverage and normalisation: Genome coverage, BigWig conversion, CPM normalisation
- Peak calling and filtering: MACS2, ENCODE blacklist filtering
- Visualisation: Heatmaps, motif analysis, correlation plots
Changed
- Standardized Directives: Enforced a uniform 10-directive layout across all 34
.smkrule files (rule,input,output,params,log,benchmark,conda,container,threads,resources,shell). - Global Containerization: Switched all rules to use stable Singularity containers via Biocontainers for 100 % reproducibility across compute environments.
- Environment Hierarchy: Refactored
rules/envs/into a stage-based hierarchical directory structure matching the six pipeline stages. - Cleaned Root Directory: Removed legacy scripts, runtime artefacts, and unused directories (
benchmarks/,scratch/,scripts/) from the repository root.
Fixed
- Resolved redundant and missing
include:statements in the mainSnakefile. - Corrected
motif_analysisoutput directory resolution issues causing downstream rules to fail. - Standardized log and benchmark file paths across the entire framework for consistent
--reportgeneration.