Visualization & Reporting: BigWigs, Heatmaps & MultiQC

The final stage of the BDB-Genomics ATAC-seq pipeline converts the processed BAM and peak files into genome-browser-ready tracks, publication-quality plots, and a comprehensive aggregated quality report. Every visualization rule is fully decoupled from the upstream analysis — it reads from clearly defined output directories and writes to results/visualization/ and results/reporting/. On pipeline completion, the onsuccess hook automatically calls aggregate_logs.py to produce a machine-readable execution summary that records per-rule CPU time, peak memory, and any error lines encountered during the run.

Bedgraph Generation

The visualization track begins with bedtools_genomecov, which converts the Tn5-shifted BAM into a bedgraph coverage file using the -bg flag (output only positions with non-zero coverage):

# config.yaml
bedtools_genomecov:
  input:
    shifted_bam: "results/post_alignment/tn5_shift"
  output:
    bedgraph: "results/visualization/bedtools_genomecov"
  params:
    extra: "-bg"
  threads: 4
  resources:
    mem_mb: 4000
    time: 60

Output: results/visualization/bedtools_genomecov/{sample}.bedGraph

Coordinate-Sorted Bedgraph

bedGraphToBigWig requires its input to be sorted by chromosome and start position. A dedicated sort rule uses the Unix sort -k1,1 -k2,2n convention to produce a coordinate-ordered bedgraph:

# config.yaml
sorted_bedgraph:
  input:
    bedgraph: "results/visualization/bedtools_genomecov"
  output:
    sorted_bedgraph: "results/visualization/sorted_bedgraph_file"
  threads: 4
  resources:
    mem_mb: 4000
    time: 60

Output: results/visualization/sorted_bedgraph_file/{sample}.sorted.bedGraph

BigWig Conversion

The sorted bedgraph is converted to the indexed BigWig binary format using UCSC’s bedGraphToBigWig. BigWig files are orders of magnitude faster to load in genome browsers (IGV, UCSC) because they store pre-computed summary statistics at multiple zoom levels:

# config.yaml
bigwig:
  input:
    sorted_bedgraph: "results/visualization/sorted_bedgraph_file"
  output:
    bigwig: "results/visualization/bigwig"
  params:
    genome: "data/reference/genome.chrom.sizes"   # *GENOME_SIZES anchor
  threads: 4
  resources:
    mem_mb: 4000
    time: 60

# rules/bigwig.smk
bedGraphToBigWig \
    {input.sorted_bedgraph} \
    {params.genome} \
    {output.bigwig} \
    2> {log}

Output: results/visualization/bigwig/{sample}.bw

CPM-Normalized Coverage

Raw read-depth BigWigs are not comparable across samples with different sequencing depths. The normalize_coverage rule uses deepTools bamCoverage to produce a CPM (counts per million) normalized BigWig directly from the Tn5-shifted BAM:

# config.yaml
normalized_coverage:
  input:
    shifted_bam: "results/post_alignment/tn5_shift"
  output:
    normalized_coverage: "results/visualization/normalized_coverage"
  params:
    method: "CPM"
  threads: 4
  resources:
    mem_mb: 8000
    time: 120

Output: results/visualization/normalized_coverage/{sample}_CPM.bw The method parameter maps directly to deepTools’ --normalizeUsing flag, so switching to RPKM or BPM requires only a config edit:

normalized_coverage:
  params:
    method: "RPKM"   # alternatives: BPM, RPKM, CPM, None

Sample Correlation Analysis

correlation_analysis uses deepTools multiBigwigSummary (binning the genome into 1 kb windows) and plotCorrelation to compute pairwise Pearson or Spearman correlation between all sample BigWigs and produce a heatmap:

# config.yaml
correlation_analysis:
  input:
    bigwig: "results/visualization/bigwig"
  output: "results/visualization/correlation_analysis"
  params:
    bin_size: 1000
  threads: 4
  resources:
    mem_mb: 8000
    time: 120

Output: results/visualization/correlation_analysis/correlation_heatmap.png This heatmap is one of the most informative QC outputs for multi-sample experiments — replicates should cluster together with correlation > 0.9 for high-quality ATAC-seq libraries.

TSS Enrichment Heatmap

The heatmap rule uses deepTools computeMatrix reference-point centered on TSS positions (± 3 kb) and plotHeatmap with the coolwarm colormap to visualize Tn5-cut-site enrichment around transcription start sites across all samples:

# config.yaml
heatmap:
  input:
    filtered_peaks: "results/peak_calling/filtered_peaks"
    bigwig:         "results/visualization/bigwig"
  output:
    plot:    "results/visualization/heatmap/plot"
    matrix:  "results/visualization/heatmap/matrix"
    regions: "results/visualization/heatmap"
  threads: 8
  resources:
    mem_mb: 16000
    time: 240
  params:
    color:      "coolwarm"
    upstream:   3000
    downstream: 3000

The rule is implemented as a Python script (rules/scripts/run_heatmap.py) that calls computeMatrix and plotHeatmap in sequence:

computeMatrix

Bins signal from the BigWig in ±3 kb windows centered on each peak:

computeMatrix reference-point \
    --referencePoint center \
    --upstream   3000 \
    --downstream 3000 \
    --scoreFileName {input.bigwig} \
    --regionsFileName {input.filtered_peaks} \
    --outFileName {output.matrix}

plotHeatmap

Renders the matrix as a PDF heatmap with the coolwarm colormap:

plotHeatmap \
    -m {output.matrix} \
    -o {output.plot} \
    --colorMap coolwarm

Output: results/visualization/heatmap/plot/{sample}_tss_heatmap.pdf

The heatmap rule takes {sample}_qc_pass.txt as an explicit input. If a sample fails the QC gate, the rule still triggers (Snakemake requires the output), but the script reads the sentinel file and produces a blank placeholder PDF so the DAG can complete without error.

MultiQC Aggregated Report

MultiQC aggregates outputs from every QC-producing tool in the pipeline into a single HTML report. The multiqc rule collects:

Source	Files Collected
FastQC	`*_fastqc.zip` (R1 and R2 per sample)
fastp	`{sample}.json`
samtools stats	`{sample}_postFiltering.stats.txt`
Picard CollectAlignmentSummaryMetrics	`{sample}.alignment_metrics.txt`
Picard CollectInsertSizeMetrics	`{sample}.insert_metrics.txt`, `{sample}.insert_histogram.pdf`
Preseq	`{sample}.ccurve.txt`
Qualimap	`{sample}_qualimap_report/`
QC Gate	`{sample}_qc_pass.json`

# config.yaml
multiqc:
  output: "results/reporting/multiqc"
  threads: 2
  resources:
    mem_mb: 4000
    time: 30
  params:
    config: "rules/config/multiqc_config.yaml"

multiqc {input} \
    -f \
    -o {params.out_dir} \
    -c {params.config} \
    --title "ATAC-seq Pipeline QC Report" \
    --comment "Comprehensive quality control metrics for ATAC-seq analysis" \
    2> {log}

Output: results/reporting/multiqc/multiqc_report.html

The MultiQC config at rules/config/multiqc_config.yaml is pre-configured to parse the _qc_pass.json files from the QC gate and display them as a custom-content table in the report. No MultiQC plugin installation is required.

Benchmark Summary

The benchmark_summary rule collects Snakemake benchmark files from 16 core rules (plus IDR and mito-read rules) and compiles them into a single TSV with columns for rule name, sample, runtime in seconds, peak memory in MB, and CPU percentage:

# rules/benchmark_summary.smk — inputs
benchmarks_per_sample = expand(
    "benchmarks/{rule}/{sample}.txt",
    rule=[
        "fastp", "fastqc", "bowtie2", "samtools_sort",
        "samtools_markdup", "samtools_view", "tn5_shift",
        "macs2", "frip", "tss_enrichment", "cross_correlation",
        "motif_analysis", "bedtools_genomecov", "bigwig",
        "heatmap", "footprinting"
    ],
    sample=SAMPLES
)

echo -e "Rule\tSample\tRuntime_s\tMemory_MB\tCPU_Percent" > {output.summary}

for f in {input...}; do
    rule=$(echo "$f" | sed 's|benchmarks/||' | cut -d'/' -f1)
    sample=$(echo "$f" | sed 's|.txt||' | rev | cut -d'/' -f1 | rev)
    [ -f "$f" ] && {
        runtime=$(head -n 2 "$f" | tail -n 1 | cut -f1)
        mem=$(head -n 2 "$f" | tail -n 1 | cut -f2)
        cpu=$(head -n 2 "$f" | tail -n 1 | cut -f3)
        echo -e "${rule}\t${sample}\t${runtime}\t${mem}\t${cpu}" >> {output.summary}
    }
done

Output: results/reporting/benchmark_summary.tsv

Pipeline Execution Summary

On every run — success or failure — the onsuccess/onerror lifecycle hook calls aggregate_logs.py to write a structured JSON execution summary:

{
    "timestamp": "2026-01-15T14:32:07",
    "status": "success",
    "performance_metrics": [
        { "s": "342.1", "h:m:s": "0:05:42", "max_rss": "14823", "rule": "bowtie2" },
        { "s": "287.4", "h:m:s": "0:04:47", "max_rss": "13411", "rule": "macs2" }
    ],
    "errors": []
}

On failure, the errors array contains the last five error-matching lines from each affected log file (filtered to exclude false positives such as “0 errors”), enabling rapid post-mortem diagnosis without manually grep-ing through the logs/ directory. Output: results/reporting/pipeline_execution_summary.json

Complete Visualization Output Summary

Raw BigWig

results/visualization/bigwig/{sample}.bw

CPM-Normalized BigWig

results/visualization/normalized_coverage/{sample}_CPM.bw

Correlation Heatmap

results/visualization/correlation_analysis/correlation_heatmap.png

TSS Heatmap

results/visualization/heatmap/plot/{sample}_tss_heatmap.pdf

MultiQC Report

results/reporting/multiqc/multiqc_report.html

Benchmark Summary

results/reporting/benchmark_summary.tsv

Execution Summary

results/reporting/pipeline_execution_summary.json

Bedgraph

results/visualization/sorted_bedgraph_file/{sample}.sorted.bedGraph

Get Started

Configuration

Pipeline Stages

Modalities

Deployment

Guides

Visualization & Reporting: BigWigs, Heatmaps & MultiQC

Bedgraph Generation

Coordinate-Sorted Bedgraph

BigWig Conversion

CPM-Normalized Coverage

Sample Correlation Analysis

TSS Enrichment Heatmap

MultiQC Aggregated Report

Benchmark Summary

Pipeline Execution Summary

Complete Visualization Output Summary

Raw BigWig

CPM-Normalized BigWig

Correlation Heatmap

TSS Heatmap

MultiQC Report

Benchmark Summary

Execution Summary

Bedgraph

Build docs developers (and LLMs) love

Get Started

Configuration

Pipeline Stages

Modalities

Deployment

Guides

Documentation Index

​Bedgraph Generation

​Coordinate-Sorted Bedgraph

​BigWig Conversion

​CPM-Normalized Coverage

​Sample Correlation Analysis

​TSS Enrichment Heatmap

​MultiQC Aggregated Report

​Benchmark Summary

​Pipeline Execution Summary

​Complete Visualization Output Summary

Raw BigWig

CPM-Normalized BigWig

Correlation Heatmap

TSS Heatmap

MultiQC Report

Benchmark Summary

Execution Summary

Bedgraph

Build docs developers (and LLMs) love

Bedgraph Generation

Coordinate-Sorted Bedgraph

BigWig Conversion

CPM-Normalized Coverage

Sample Correlation Analysis

TSS Enrichment Heatmap

MultiQC Aggregated Report

Benchmark Summary

Pipeline Execution Summary

Complete Visualization Output Summary