The final stage of the BDB-Genomics ATAC-seq pipeline converts the processed BAM and peak files into genome-browser-ready tracks, publication-quality plots, and a comprehensive aggregated quality report. Every visualization rule is fully decoupled from the upstream analysis — it reads from clearly defined output directories and writes toDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/atacseq-pipeline/llms.txt
Use this file to discover all available pages before exploring further.
results/visualization/ and results/reporting/. On pipeline completion, the onsuccess hook automatically calls aggregate_logs.py to produce a machine-readable execution summary that records per-rule CPU time, peak memory, and any error lines encountered during the run.
Bedgraph Generation
The visualization track begins withbedtools_genomecov, which converts the Tn5-shifted BAM into a bedgraph coverage file using the -bg flag (output only positions with non-zero coverage):
results/visualization/bedtools_genomecov/{sample}.bedGraph
Coordinate-Sorted Bedgraph
bedGraphToBigWig requires its input to be sorted by chromosome and start position. A dedicated sort rule uses the Unix sort -k1,1 -k2,2n convention to produce a coordinate-ordered bedgraph:
results/visualization/sorted_bedgraph_file/{sample}.sorted.bedGraph
BigWig Conversion
The sorted bedgraph is converted to the indexed BigWig binary format using UCSC’sbedGraphToBigWig. BigWig files are orders of magnitude faster to load in genome browsers (IGV, UCSC) because they store pre-computed summary statistics at multiple zoom levels:
results/visualization/bigwig/{sample}.bw
CPM-Normalized Coverage
Raw read-depth BigWigs are not comparable across samples with different sequencing depths. Thenormalize_coverage rule uses deepTools bamCoverage to produce a CPM (counts per million) normalized BigWig directly from the Tn5-shifted BAM:
results/visualization/normalized_coverage/{sample}_CPM.bw
The method parameter maps directly to deepTools’ --normalizeUsing flag, so switching to RPKM or BPM requires only a config edit:
Sample Correlation Analysis
correlation_analysis uses deepTools multiBigwigSummary (binning the genome into 1 kb windows) and plotCorrelation to compute pairwise Pearson or Spearman correlation between all sample BigWigs and produce a heatmap:
results/visualization/correlation_analysis/correlation_heatmap.png
This heatmap is one of the most informative QC outputs for multi-sample experiments — replicates should cluster together with correlation > 0.9 for high-quality ATAC-seq libraries.
TSS Enrichment Heatmap
The heatmap rule uses deepToolscomputeMatrix reference-point centered on TSS positions (± 3 kb) and plotHeatmap with the coolwarm colormap to visualize Tn5-cut-site enrichment around transcription start sites across all samples:
rules/scripts/run_heatmap.py) that calls computeMatrix and plotHeatmap in sequence:
Output:
results/visualization/heatmap/plot/{sample}_tss_heatmap.pdf
The heatmap rule takes
{sample}_qc_pass.txt as an explicit input. If a sample fails the QC gate, the rule still triggers (Snakemake requires the output), but the script reads the sentinel file and produces a blank placeholder PDF so the DAG can complete without error.MultiQC Aggregated Report
MultiQC aggregates outputs from every QC-producing tool in the pipeline into a single HTML report. Themultiqc rule collects:
| Source | Files Collected |
|---|---|
| FastQC | *_fastqc.zip (R1 and R2 per sample) |
| fastp | {sample}.json |
| samtools stats | {sample}_postFiltering.stats.txt |
| Picard CollectAlignmentSummaryMetrics | {sample}.alignment_metrics.txt |
| Picard CollectInsertSizeMetrics | {sample}.insert_metrics.txt, {sample}.insert_histogram.pdf |
| Preseq | {sample}.ccurve.txt |
| Qualimap | {sample}_qualimap_report/ |
| QC Gate | {sample}_qc_pass.json |
results/reporting/multiqc/multiqc_report.html
Benchmark Summary
Thebenchmark_summary rule collects Snakemake benchmark files from 16 core rules (plus IDR and mito-read rules) and compiles them into a single TSV with columns for rule name, sample, runtime in seconds, peak memory in MB, and CPU percentage:
results/reporting/benchmark_summary.tsv
Pipeline Execution Summary
On every run — success or failure — theonsuccess/onerror lifecycle hook calls aggregate_logs.py to write a structured JSON execution summary:
errors array contains the last five error-matching lines from each affected log file (filtered to exclude false positives such as “0 errors”), enabling rapid post-mortem diagnosis without manually grep-ing through the logs/ directory.
Output: results/reporting/pipeline_execution_summary.json
Complete Visualization Output Summary
Raw BigWig
results/visualization/bigwig/{sample}.bwCPM-Normalized BigWig
results/visualization/normalized_coverage/{sample}_CPM.bwCorrelation Heatmap
results/visualization/correlation_analysis/correlation_heatmap.pngTSS Heatmap
results/visualization/heatmap/plot/{sample}_tss_heatmap.pdfMultiQC Report
results/reporting/multiqc/multiqc_report.htmlBenchmark Summary
results/reporting/benchmark_summary.tsvExecution Summary
results/reporting/pipeline_execution_summary.jsonBedgraph
results/visualization/sorted_bedgraph_file/{sample}.sorted.bedGraph