Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/atacseq-pipeline/llms.txt

Use this file to discover all available pages before exploring further.

The BDB-Genomics ATAC-seq pipeline ships a ready-made SLURM profile at profile/slurm/config.yaml that maps every Snakemake rule to a SLURM batch job automatically. You do not need to write any sbatch scripts. Snakemake reads the per-rule resources blocks declared in config.yaml and translates them directly into SLURM --mem, --time, and --partition flags. The profile supports up to 100 concurrent jobs, automatic retry on transient failures, and structured failure logging.

Profile Configuration

# profile/slurm/config.yaml

executor: slurm
use-conda: true
jobs: 100
printshellcmds: true
show-failed-logs: true
keep-going: true
rerun-incomplete: true
restart-times: 1

latency-wait: 60

default-resources:
  mem_mb: 4000
  time: 60
  threads: 1
  slurm_partition: "standard"
  slurm_account: "bdb_genomics"
SettingValueEffect
executorslurmUse the Snakemake SLURM executor plugin
jobs100Maximum concurrent SLURM jobs
latency-wait60Seconds to wait for output files to appear on shared filesystems
restart-times1Automatically resubmit a failed job once before marking it failed
keep-goingtrueContinue running other samples when one sample’s job fails
rerun-incompletetrueRe-run jobs whose output files are incomplete from a previous run
slurm_partitionstandardDefault SLURM partition (queue)
slurm_accountbdb_genomicsSLURM accounting group

Running the Pipeline

snakemake --profile profile/slurm
For scATAC mode, prepend the environment variable:
ATAC_MODE=scatac snakemake --profile profile/slurm
Snakemake submits each rule as an independent SLURM job. Rules with per-rule resource blocks in config.yaml use those values; all other rules fall back to the default-resources declared in the profile.

Per-Rule Resource Overrides

Resource limits in config.yaml are automatically respected by the SLURM executor. Memory-intensive rules request more RAM; quick bookkeeping rules stay well within the defaults:
# config.yaml — high-memory rules request dedicated resources
bowtie2:
  threads: 8
  resources:
    mem_mb: 16000   # 16 GB — overrides the 4 GB default
    time: 240       # 4 hours

macs2:
  threads: 8
  resources:
    mem_mb: 16000
    time: 240

differential_accessibility:
  threads: 8
  resources:
    mem_mb: 16000
    time: 240

# Low-overhead rules stay within default limits
qc_gate:
  threads: 1
  resources:
    mem_mb: 1000
    time: 10

frip_calculation:
  threads: 4
  resources:
    mem_mb: 2000
    time: 15
You never need to edit profile/slurm/config.yaml to change per-rule resources. Set them in config.yaml and the SLURM executor picks them up automatically.

Customising Partition and Account

Most HPC sites use site-specific partition names and accounting groups. Edit the default-resources block in profile/slurm/config.yaml to match your cluster:
# profile/slurm/config.yaml
default-resources:
  mem_mb: 4000
  time: 60
  threads: 1
  slurm_partition: "high_mem"        # ← your partition name
  slurm_account: "lab_allocation"    # ← your allocation account
To override the partition for a specific rule only (without modifying the profile), add the resource directly to that rule’s block in config.yaml:
# config.yaml
archr:
  resources:
    mem_mb: 64000
    time: 240
    slurm_partition: "bigmem"        # ← rule-specific partition override

Handling Transient Failures

SLURM jobs can fail for reasons unrelated to the pipeline (node preemption, filesystem timeouts, memory allocation races). The profile sets restart-times: 1 so Snakemake automatically resubmits a failed job once. If it fails a second time, Snakemake marks the rule as definitively failed and — because keep-going: true is set — continues processing all other samples.
# profile/slurm/config.yaml
restart-times: 1   # resubmit once on transient failure
keep-going: true   # do not halt the whole pipeline for one failed sample

Monitoring Jobs

Check the status of all running and pending jobs submitted by your account:
squeue -u $USER
Filter to jobs with “snakemake” in the name:
squeue -u $USER --name snakemake

Resuming a Partial Run

If the pipeline is interrupted, restart with the same command. The rerun-incomplete: true setting ensures Snakemake identifies and re-runs any rules whose output files are partial or missing, without re-running rules that completed successfully:
snakemake --profile profile/slurm

Dry Run Before Submission

Preview the full DAG and resource requirements before submitting to the queue:
snakemake --profile profile/slurm --dry-run
This prints every job that would be submitted, its requested memory and time, and the dependency order — without actually calling sbatch.

Structured Execution Summary

On completion or failure the pipeline writes a JSON execution summary:
results/reporting/pipeline_execution_summary.json
This file captures per-rule CPU time, peak memory, and — on failure — the last five error lines from the corresponding log file in logs/. The benchmarks/ directory contains full per-job resource consumption records written by Snakemake’s built-in benchmarking.
The structured JSON summary is generated by rules/scripts/aggregate_logs.py, which is called automatically in the Snakefile onsuccess and onerror lifecycle hooks. No manual invocation is required.

Build docs developers (and LLMs) love