The BDB-Genomics ATAC-seq pipeline is an open-source project and contributions of all kinds are welcome — whether you are fixing a bug in an existing rule, adding support for a new tool, improving documentation, or helping triage issues. The goal of the project is to build the most rigorous, modular standard for epigenomic data processing, and every contribution moves that goal forward. This guide covers how to report problems, submit code changes, follow the architectural conventions that keep the pipeline maintainable, and publish new releases to Zenodo.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/atacseq-pipeline/llms.txt
Use this file to discover all available pages before exploring further.
Reporting bugs and feature requests
If you encounter unexpected behaviour, an error message you cannot resolve, or a tool integration you would like to see added, please open a GitHub Issue at github.com/BDB-Genomics/atacseq-pipeline. A good bug report includes:- The exact error message or unexpected output
- The relevant section of your
config.yaml - The Snakemake log from
logs/<stage>/<tool>/<sample>.log - A minimal example that reproduces the problem (synthetic data from
generate_test_data.pyis ideal) - Your operating system, Snakemake version (
snakemake --version), and Conda/Mamba version
Submitting pull requests
Use descriptive branch names (
feat/, fix/, docs/) so reviewers immediately understand the scope of the change.Every new tool must live in its own
.smk rule file and its own .yaml Conda environment. Do not add new logic to an existing rule file, and do not share Conda environments between rules. Follow the full steps in Adding a Tool to wire up the rule, environment, and config block correctly.# New rule file
rules/my_tool.smk
# New Conda environment (choose the correct stage directory)
rules/envs/05_peak_calling/my_tool.yaml
# Validate config structure
pytest rules/scripts/test_validate_config.py
# Snakemake linter
snakemake --lint
# Dry-run with test profile
snakemake -n --use-conda --profile profile/test
# Full synthetic run
python3 rules/scripts/generate_test_data.py
snakemake --use-conda --cores 4 --profile profile/test
Architecture guidelines
These conventions are enforced in code review. PRs that violate them will be asked to revise before merging.Modularity: one tool, one .smk, one .yaml
Modularity: one tool, one .smk, one .yaml
Every tool must have its own Snakemake rule file (
rules/<toolname>.smk) and its own isolated Conda environment descriptor (rules/envs/<stage>/<toolname>.yaml). No rule may embed another tool’s logic inline. This policy makes it possible to update, disable, or replace any single step without side effects on the rest of the DAG.Fail fast: checks before expensive steps
Fail fast: checks before expensive steps
Rules must validate their inputs before launching long-running shell commands. If a required input file is missing, a minimum read count is not met, or a preceding QC gate has failed, the rule must exit immediately with a clear error message and a non-zero exit code. This prevents wasted cluster hours on jobs that are guaranteed to fail.
All paths in config.yaml, never hardcoded
All paths in config.yaml, never hardcoded
Every path to an input directory, output directory, reference file, or intermediate file must be declared in
config.yaml and read from config[...] inside the rule. The only string literals permitted inside a .smk file are log/benchmark path patterns (which contain {wildcards}) and the conda: / container: directives. Changing a reference path must require editing config.yaml only — never touching a rule file.Resources always from config, scaled by attempt
Resources always from config, scaled by attempt
The
mem_mb and time fields in the resources: block must be read from config[...]['resources'] and multiplied by attempt (via a lambda) so that Snakemake can automatically retry failed jobs with doubled resources on HPC clusters. Hard-coding mem_mb: 8000 in a rule is not permitted.Maintainer guidelines: publishing to Zenodo
When cutting a new release, maintainers should deposit the updated pipeline to Zenodo to mint a persistent DOI. There are two supported methods.Option A: zenodo_deposit.py (recommended)
The scriptrules/scripts/zenodo_deposit.py handles packaging and upload in a single command. It reads all publication metadata directly from CITATION.cff — title, version, authors, abstract, keywords, and license — so there is no manual data entry.
Step 1: Generate a Zenodo access token
For sandbox (safe testing):
sandbox.zenodo.org/account/settings/applications
For production:
zenodo.org/account/settings/applications
Create a new Personal Access Token with the deposit:write and deposit:actions scopes.
Step 2: Run the deposit script
- Sandbox (test)
- Production (draft)
- Production (publish)
sandbox.zenodo.org. No DOI is minted and the record is not publicly visible. Use this to verify metadata before going to production.- Packages the repository with
git archive --format=zip HEAD - Parses
CITATION.cfffor title, version, creators, abstract, keywords, and license - Creates a new draft deposition via the Zenodo REST API
- Uploads the zip archive to the deposition’s S3 bucket
- Updates the deposition metadata
- Prints the draft review URL
Option B: Native GitHub–Zenodo integration
For fully automated DOI minting on every GitHub Release:- Log in to zenodo.org using your GitHub credentials
- Navigate to your Zenodo profile → GitHub settings
- Toggle the switch for
BDB-Genomics/atacseq-pipelineto On - Create a new GitHub Release on the repository (any tag format works)
- Zenodo automatically captures the release archive and mints a new DOI
The GitHub–Zenodo integration mints a DOI immediately on release creation. Use Option A with the sandbox first to verify the metadata you want to appear on the Zenodo record before creating the GitHub Release.
Citation
If you use this pipeline in your research, please cite it using the following reference drawn fromCITATION.cff:
Bhandary, H. (2026). BDB-Genomics ATAC-seq Framework (Version 3.0.0). GitHub. https://github.com/BDB-Genomics/atacseq-pipeline
License
The BDB-Genomics ATAC-seq Framework is released under the MIT License. You are free to use, modify, and distribute the software for any purpose, provided the original copyright notice and license text are retained. See theLICENSE file in the repository root for the full license text.
Thank you for helping build better open science.