BDB ATAC-seq Pipeline: Complete Output Directory Manifest

All pipeline outputs are written beneath the results/ directory at the project root. The directory hierarchy mirrors the six-stage DAG — Preprocessing → Alignment → Post-alignment → Metrics & QC → Peak Calling → Visualization — so individual stages can be inspected or restarted independently. Per-rule benchmark timings land in benchmarks/ and execution logs in logs/, both at the project root.

Paths prefixed with {sample} expand to one file per entry in your data/fastp/samples.tsv sample sheet. Paths prefixed with {condition} or {condition}_rep{N}_rep{M} expand based on the condition/replicate columns in that sheet.

Preprocessing

results/preprocessing/fastp/

directory

Adapter-trimmed and quality-filtered FASTQ files plus fastp QC reports.

Show Key files

{sample}_R1_trimmed.fastq.gz

file

Trimmed Read 1 FASTQ (gzip-compressed). Produced by fastp with trim_front1: 5 and length_required: 30.

{sample}_R2_trimmed.fastq.gz

file

Trimmed Read 2 FASTQ (gzip-compressed).

{sample}_fastp.json

file

Structured JSON QC report from fastp, ingested by MultiQC.

{sample}_fastp.html

file

Interactive HTML QC report from fastp.

results/preprocessing/fastqc/

directory

Post-trimming per-read quality reports from FastQC.

Show Key files

{sample}_R1_trimmed_fastqc.html

file

Interactive FastQC HTML report for trimmed Read 1.

{sample}_R2_trimmed_fastqc.html

file

Interactive FastQC HTML report for trimmed Read 2.

{sample}_R1_trimmed_fastqc.zip

file

Raw FastQC data archive for programmatic parsing.

Alignment

results/alignment/bowtie2/

directory

Raw aligned BAM files from Bowtie2 (bulk ATAC-seq mode only).

Show Key files

{sample}.bam

file

Unsorted BAM produced by bowtie2 --very-sensitive paired-end alignment to the reference genome.

results/alignment/chromap/

directory

Raw aligned BAM files from Chromap (scATAC-seq mode only; requires global.mode: "scatac").

Show Key files

{sample}.bam

file

BAM produced by chromap --preset atac. Used as input to the ArchR and Cicero scATAC-seq analysis rules.

Post-Alignment Processing

results/post_alignment/samtools_sort/

directory

Coordinate-sorted BAMs.

Show Key files

{sample}.sorted.bam

file

Coordinate-sorted BAM. Input to samtools_fixmate.

results/post_alignment/samtools_fixmate/

directory

BAMs with filled mate-score tags, required before duplicate marking.

Show Key files

{sample}.sorted.fixmate.bam

file

Sorted BAM with mate-score tags filled by samtools fixmate -m.

results/post_alignment/samtools_markdup/

directory

PCR-duplicate-marked BAMs (duplicates retained by default).

Show Key files

{sample}.sorted.dedup.bam

file

Deduplicated BAM. Duplicate reads are marked but not removed (remove_duplicates: false). Input to Picard, TOBIAS, and Qualimap.

{sample}.sorted.dedup.bam.bai

file

BAM index produced by the samtools_index_post_markdup rule.

results/post_alignment/remove_mito_reads/

directory

Mitochondrial-read-free sorted BAMs.

Show Key files

{sample}_noMT.sorted.bam

file

BAM with all reads mapping to chrMT (configurable via remove_mito_reads.params.mito_chr) excluded. Input to samtools_view and samtools_stats.

results/post_alignment/samtools_view/

directory

Quality-filtered, blacklist-cleaned BAMs.

Show Key files

{sample}.filtered.bam

file

BAM after MAPQ ≥ 30 filtering (-q 30), flag-based exclusion (-F 3844), and ENCODE blacklist removal. Input to Tn5 shifting, TOBIAS, and cross-correlation analysis.

{sample}.filtered.bam.bai

file

BAM index placed alongside the filtered BAM by samtools_index_post_filter.

results/post_alignment/tn5_shift/

directory

Tn5-transposase-corrected BAMs — the primary analysis BAM for peak calling and visualisation.

Show Key files

{sample}.filtered.shifted.bam

file

BAM with reads shifted +4 bp (forward strand) and −5 bp (reverse strand) to correct for Tn5 insertion bias. Input to MACS2, FRiP calculation, TSS enrichment, and coverage tracks.

{sample}.filtered.shifted.bam.bai

file

BAM index.

results/post_alignment/mito-ATAC/

directory

Mitochondrial read fraction statistics produced before deduplication.

Show Key files

{sample}_mito_stats.txt

file

Tab-separated file containing total read count and mitochondrial read fraction for each sample.

results/post_alignment/samtools_stats/

directory

Raw samtools stats output files consumed by the QC gate.

Show Key files

{sample}_postFiltering.stats.txt

file

Full samtools stats output. The QC gate script extracts sequences, reads duplicated, and percentage of properly paired reads from the SN section.

Metrics & QC

results/metrics_qc/tss_enrichment/

directory

TSS enrichment scores computed by tss_enrichment.R.

Show Key files

{sample}_tss_enrichment.txt

file

Tab-separated file with two columns: sample name and TSS enrichment score. Consumed by parse_qc_metrics.py.

{sample}_tss_enrichment.pdf

file

TSS enrichment profile plot (signal ± 2 kb around TSSes).

results/metrics_qc/picard/

directory

Picard tool outputs for alignment and insert-size QC.

Show Key files

CollectAlignmentSummaryMetrics/{sample}.alignment_metrics.txt

file

Picard alignment summary: total reads, mapped rate, strand balance, and chimeric read fraction.

CollectInsertSizeMetrics/{sample}.insert_metrics.txt

file

Picard insert-size summary statistics, including median insert size and mean insert size.

CollectInsertSizeMetrics/{sample}.insert_size_histogram.pdf

file

Insert-size frequency histogram PDF showing nucleosome banding pattern.

results/metrics_qc/cross_correlation/

directory

NSC/RSC strand cross-correlation outputs from phantompeakqualtools.

Show Key files

{sample}_crosscorr.txt

file

Tab-separated file containing NSC, RSC, estimated fragment length, and phantom peak shift.

{sample}_crosscorr.pdf

file

Cross-correlation profile plot.

results/metrics_qc/fragment_size_analysis/

directory

Fragment size distribution plots and summary statistics.

Show Key files

{sample}_fragment_sizes.pdf

file

Histogram of fragment size distribution with nucleosomal banding annotations.

{sample}_fragment_stats.txt

file

Summary statistics (NFR fraction, mono-nucleosomal fraction, di-nucleosomal fraction).

QC Gate

results/qc_gate/

directory

Per-sample pass/fail trigger files and structured QC data. Downstream rules require {sample}_qc_pass.txt as an explicit Snakemake input.

Show Key files

{sample}_qc_pass.txt

file

Single-line trigger file: {sample}\tPASSED or {sample}\tFAILED. Snakemake uses this file as a dependency checkpoint for all downstream rules.

{sample}_qc_pass.json

file

Structured JSON QC report containing per-metric values, targets, and statuses. See the QC Thresholds reference for the full schema.

Peak Calling

results/peak_calling/macs2_peakcall/

directory

Raw peak calls from MACS2.

Show Key files

{sample}_peaks.narrowPeak

file

ENCODE narrowPeak format: chromosome, start, end, name, score, strand, fold-change, −log₁₀(p-value), −log₁₀(q-value), summit offset.

{sample}_summits.bed

file

Single-base-pair peak summits BED file.

{sample}_peaks.xls

file

MACS2 peak spreadsheet with extended statistics.

results/peak_calling/filtered_peaks/

directory

Blacklist-filtered peaks — the primary peak set used by all downstream analyses.

Show Key files

{sample}_filtered_peaks.bed

file

narrowPeak file with ENCODE blacklist regions removed. Input to FRiP calculation, heatmap, peak annotation, motif analysis, and TOBIAS.

results/peak_calling/frip_calculation/

directory

FRiP score output files consumed by the QC gate.

Show Key files

{sample}_frip.txt

file

Tab-separated file: sample name and FRiP score (e.g., SAMPLE\t0.342).

results/peak_calling/idr/

directory

Irreproducible Discovery Rate outputs for replicate concordance analysis.

Show Subdirectories

idr_peaks/{condition}_rep{N}_rep{M}_idr_peaks.bed

file

Peaks passing the IDR threshold (default 0.05) between replicate pairs.

optimal_peaks/{condition}_optimal_peaks.bed

file

Final optimal peak set selected by the IDR analysis.

plots/{condition}_rep{N}_rep{M}_idr_plot.png

file

IDR diagnostic scatter plot.

results/peak_calling/consensus_peaks/

directory

Multi-sample merged consensus peak set.

Show Key files

consensus_peaks.bed

file

Non-redundant consensus peak set merging peaks present in at least min_samples (default: 2) samples, with peaks within merge_distance (default: 100 bp) collapsed.

peak_sample_counts.txt

file

Tab-separated matrix showing how many samples each consensus peak was called in.

results/peak_calling/count_peaks/

directory

Read count matrix over consensus peaks for DESeq2 input.

Show Key files

peak_counts.tsv

file

Tab-separated count matrix: rows are consensus peaks, columns are samples.

results/peak_calling/differential_accessibility/

directory

DESeq2-based differential chromatin accessibility results.

Show Key files

diff_accessibility_results.tsv

file

Full DESeq2 results table: peak coordinates, base mean, log₂FC, standard error, Wald statistic, p-value, and adjusted p-value (FDR).

plots/volcano_plot.pdf

file

Volcano plot: −log₁₀(FDR) vs log₂ fold-change, with significant peaks highlighted.

plots/ma_plot.pdf

file

MA plot: log₂FC vs mean accessibility, coloured by significance.

plots/pca_plot.pdf

file

PCA of variance-stabilised count data across all samples.

results/peak_calling/tobias/

directory

TOBIAS bias-corrected TF footprinting results.

Show Subdirectories

corrected_bw/{sample}_corrected.bw

file

Tn5 bias-corrected ATAC-seq signal BigWig (ATACorrect output).

footprint_bw/{sample}_footprints.bw

file

TOBIAS footprint score BigWig (ScoreBigwig output).

bindetect/

directory

BINDetect output directory: per-TF binding scores, differential binding plots, and a summary table across conditions.

results/peak_calling/footprinting/

directory

HINT-ATAC footprint calls via the RGT toolkit.

Show Key files

{sample}_footprints.bed

file

BED file of predicted TF-bound footprint regions.

results/peak_calling/chromvar/

directory

chromVAR TF motif accessibility deviation scores.

Show Subdirectories

deviations/

directory

Raw chromVAR deviation score matrices per TF motif (RDS and TSV formats).

bias_corrected/

directory

GC-bias-corrected deviation scores.

plots/

directory

Heatmaps and variability plots of TF deviation scores across samples.

results/peak_calling/peak_annotation/

directory

Genomic feature annotations for filtered peaks.

Show Key files

{sample}_peak_annotation.txt

file

HOMER or ChIPseeker annotation table: peak coordinates + nearest gene, genomic feature category (promoter, intron, exon, intergenic), distance to TSS.

results/peak_calling/motif_analysis/

directory

HOMER de novo and known motif enrichment results per sample.

Show Key files

{sample}/homerResults.html

file

HOMER de novo motif enrichment results page.

{sample}/knownResults.html

file

HOMER known motif enrichment results page.

scATAC-seq Outputs

results/scatac/archr/

directory

ArchR single-cell ATAC-seq analysis outputs. Only generated when global.mode: "scatac".

Show Subdirectories

arrow/

directory

Raw ArchR Arrow files (one per sample), containing per-cell fragment matrices and metadata.

filtered_arrow/

directory

Arrow files after doublet removal and cell QC filtering (min_tss: 4.0, min_frags: 1000, max_frags: 100000).

clusters/cell_clusters.tsv

file

Tab-separated file mapping each cell barcode to its Leiden cluster assignment.

plots/umap_clusters.pdf

file

UMAP embedding coloured by cluster identity.

markers/marker_genes.tsv

file

Differentially accessible peaks and marker genes per cluster.

doublets/doublet_enrichment.pdf

file

Doublet enrichment score distribution used to set the doublet_threshold: 0.2 cutoff.

results/scatac/cicero/

directory

Cicero chromatin co-accessibility outputs.

Show Subdirectories

connections/coaccessibility_connections.rds

file

R RDS file containing the full co-accessibility connection object from Cicero.

connections/coaccessibility_table.tsv

file

Tab-separated connection table: Peak1, Peak2, co-accessibility score (0–1).

ccans/ccans.bed

file

BED file of identified Cis-Co-Accessibility Networks (CCANs).

Visualization

results/visualization/bigwig/

directory

Raw signal BigWig files converted from sorted bedGraph.

Show Key files

{sample}.bw

file

BigWig coverage track from the Tn5-shifted BAM, for genome browser loading and deepTools analysis.

results/visualization/normalized_coverage/

directory

CPM-normalised BigWig tracks for cross-sample comparability.

Show Key files

{sample}_CPM.bw

file

Counts Per Million normalised BigWig, produced by bamCoverage --normalizeUsing CPM.

results/visualization/heatmap/

directory

deepTools heatmap plots and data matrices centred on filtered peaks.

Show Key files

plot/{sample}_tss_heatmap.pdf

file

Read-density heatmap PDF, ±3 kb around peak centres, coloured by coolwarm palette.

matrix/{sample}_heatmap_matrix.gz

file

Compressed deepTools matrix file for replotting or downstream analysis.

results/visualization/correlation_analysis/

directory

Inter-sample BigWig correlation analysis outputs.

Show Key files

correlation_heatmap.pdf

file

Pearson/Spearman correlation heatmap across all samples.

correlation_matrix.tab

file

Raw correlation coefficient matrix in tab-separated format.

Reporting

results/reporting/multiqc/

directory

Consolidated MultiQC report aggregating all tool QC outputs.

Show Key files

multiqc_report.html

file

Interactive HTML report combining fastp, FastQC, Picard, samtools, Qualimap, preseq, and QC gate metrics across all samples.

multiqc_data/

directory

Raw JSON and TSV data files extracted by MultiQC for programmatic use.

results/reporting/pipeline_execution_summary.json

file

Structured JSON summary written by the Snakemake onsuccess lifecycle hook. Contains run metadata, sample list, mode, and completion timestamp. Consumed by atacseq_tool.py to return a structured status string to AI agents.

results/reporting/benchmark_summary.tsv

file

Aggregated benchmark table produced by the benchmark_summary rule. Columns include rule name, sample, wall-clock time (seconds), CPU time (seconds), and peak memory (MB) drawn from individual benchmarks/*.txt files.

Benchmarks

benchmarks/

directory

Per-rule, per-sample Snakemake benchmark files in tab-separated format.Each file records: s (wall-clock seconds), h:m:s (human-readable time), max_rss (peak RSS memory in MB), max_vms, max_uss, max_pss, io_in, io_out, mean_load, cpu_time.

benchmarks/
├── fastp/{sample}.txt
├── bowtie2/{sample}.txt
├── samtools_markdup/{sample}.txt
├── macs2/{sample}.txt
├── idr/{condition}_rep{N}_rep{M}.txt
└── multiqc/multiqc.txt

Logs

logs/

directory

Per-rule, per-sample stderr and stdout log files. Log file extensions follow the global convention: .err for stderr, .log for combined output.

logs/
├── fastp/{sample}.log
├── bowtie2/{sample}.log
├── samtools_markdup/{sample}.log
├── qc_gate/{sample}.log
├── macs2/{sample}.log
└── multiqc/multiqc.log

Configuration Reference

Scripts

Changelog

BDB ATAC-seq Pipeline: Complete Output Directory Manifest

Preprocessing

Alignment

Post-Alignment Processing

Metrics & QC

QC Gate

Peak Calling

scATAC-seq Outputs

Visualization

Reporting

Benchmarks

Logs

Build docs developers (and LLMs) love

Configuration Reference

Scripts

Changelog

Documentation Index

​Preprocessing

​Alignment

​Post-Alignment Processing

​Metrics & QC

​QC Gate

​Peak Calling

​scATAC-seq Outputs

​Visualization

​Reporting

​Benchmarks

​Logs

Build docs developers (and LLMs) love

Preprocessing

Alignment

Post-Alignment Processing

Metrics & QC

QC Gate

Peak Calling

scATAC-seq Outputs

Visualization

Reporting

Benchmarks

Logs