Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/atacseq-pipeline/llms.txt

Use this file to discover all available pages before exploring further.

All pipeline outputs are written beneath the results/ directory at the project root. The directory hierarchy mirrors the six-stage DAG — Preprocessing → Alignment → Post-alignment → Metrics & QC → Peak Calling → Visualization — so individual stages can be inspected or restarted independently. Per-rule benchmark timings land in benchmarks/ and execution logs in logs/, both at the project root.
Paths prefixed with {sample} expand to one file per entry in your data/fastp/samples.tsv sample sheet. Paths prefixed with {condition} or {condition}_rep{N}_rep{M} expand based on the condition/replicate columns in that sheet.

Preprocessing

results/preprocessing/fastp/
directory
Adapter-trimmed and quality-filtered FASTQ files plus fastp QC reports.
results/preprocessing/fastqc/
directory
Post-trimming per-read quality reports from FastQC.

Alignment

results/alignment/bowtie2/
directory
Raw aligned BAM files from Bowtie2 (bulk ATAC-seq mode only).
results/alignment/chromap/
directory
Raw aligned BAM files from Chromap (scATAC-seq mode only; requires global.mode: "scatac").

Post-Alignment Processing

results/post_alignment/samtools_sort/
directory
Coordinate-sorted BAMs.
results/post_alignment/samtools_fixmate/
directory
BAMs with filled mate-score tags, required before duplicate marking.
results/post_alignment/samtools_markdup/
directory
PCR-duplicate-marked BAMs (duplicates retained by default).
results/post_alignment/remove_mito_reads/
directory
Mitochondrial-read-free sorted BAMs.
results/post_alignment/samtools_view/
directory
Quality-filtered, blacklist-cleaned BAMs.
results/post_alignment/tn5_shift/
directory
Tn5-transposase-corrected BAMs — the primary analysis BAM for peak calling and visualisation.
results/post_alignment/mito-ATAC/
directory
Mitochondrial read fraction statistics produced before deduplication.
results/post_alignment/samtools_stats/
directory
Raw samtools stats output files consumed by the QC gate.

Metrics & QC

results/metrics_qc/tss_enrichment/
directory
TSS enrichment scores computed by tss_enrichment.R.
results/metrics_qc/picard/
directory
Picard tool outputs for alignment and insert-size QC.
results/metrics_qc/cross_correlation/
directory
NSC/RSC strand cross-correlation outputs from phantompeakqualtools.
results/metrics_qc/fragment_size_analysis/
directory
Fragment size distribution plots and summary statistics.

QC Gate

results/qc_gate/
directory
Per-sample pass/fail trigger files and structured QC data. Downstream rules require {sample}_qc_pass.txt as an explicit Snakemake input.

Peak Calling

results/peak_calling/macs2_peakcall/
directory
Raw peak calls from MACS2.
results/peak_calling/filtered_peaks/
directory
Blacklist-filtered peaks — the primary peak set used by all downstream analyses.
results/peak_calling/frip_calculation/
directory
FRiP score output files consumed by the QC gate.
results/peak_calling/idr/
directory
Irreproducible Discovery Rate outputs for replicate concordance analysis.
results/peak_calling/consensus_peaks/
directory
Multi-sample merged consensus peak set.
results/peak_calling/count_peaks/
directory
Read count matrix over consensus peaks for DESeq2 input.
results/peak_calling/differential_accessibility/
directory
DESeq2-based differential chromatin accessibility results.
results/peak_calling/tobias/
directory
TOBIAS bias-corrected TF footprinting results.
results/peak_calling/footprinting/
directory
HINT-ATAC footprint calls via the RGT toolkit.
results/peak_calling/chromvar/
directory
chromVAR TF motif accessibility deviation scores.
results/peak_calling/peak_annotation/
directory
Genomic feature annotations for filtered peaks.
results/peak_calling/motif_analysis/
directory
HOMER de novo and known motif enrichment results per sample.

scATAC-seq Outputs

results/scatac/archr/
directory
ArchR single-cell ATAC-seq analysis outputs. Only generated when global.mode: "scatac".
results/scatac/cicero/
directory
Cicero chromatin co-accessibility outputs.

Visualization

results/visualization/bigwig/
directory
Raw signal BigWig files converted from sorted bedGraph.
results/visualization/normalized_coverage/
directory
CPM-normalised BigWig tracks for cross-sample comparability.
results/visualization/heatmap/
directory
deepTools heatmap plots and data matrices centred on filtered peaks.
results/visualization/correlation_analysis/
directory
Inter-sample BigWig correlation analysis outputs.

Reporting

results/reporting/multiqc/
directory
Consolidated MultiQC report aggregating all tool QC outputs.
results/reporting/pipeline_execution_summary.json
file
Structured JSON summary written by the Snakemake onsuccess lifecycle hook. Contains run metadata, sample list, mode, and completion timestamp. Consumed by atacseq_tool.py to return a structured status string to AI agents.
results/reporting/benchmark_summary.tsv
file
Aggregated benchmark table produced by the benchmark_summary rule. Columns include rule name, sample, wall-clock time (seconds), CPU time (seconds), and peak memory (MB) drawn from individual benchmarks/*.txt files.

Benchmarks

benchmarks/
directory
Per-rule, per-sample Snakemake benchmark files in tab-separated format.Each file records: s (wall-clock seconds), h:m:s (human-readable time), max_rss (peak RSS memory in MB), max_vms, max_uss, max_pss, io_in, io_out, mean_load, cpu_time.
benchmarks/
├── fastp/{sample}.txt
├── bowtie2/{sample}.txt
├── samtools_markdup/{sample}.txt
├── macs2/{sample}.txt
├── idr/{condition}_rep{N}_rep{M}.txt
└── multiqc/multiqc.txt

Logs

logs/
directory
Per-rule, per-sample stderr and stdout log files. Log file extensions follow the global convention: .err for stderr, .log for combined output.
logs/
├── fastp/{sample}.log
├── bowtie2/{sample}.log
├── samtools_markdup/{sample}.log
├── qc_gate/{sample}.log
├── macs2/{sample}.log
└── multiqc/multiqc.log

Build docs developers (and LLMs) love