visualize_results.py script generates publication-ready plots at 300 DPI with a clean, modern style. All plots use consistent colors: blue for audio, orange for visual.
Quick Reference
Error vs Offset
Grouped bar chart — MAE by offset magnitude
Confidence vs Error
Scatter plot — Confidence score reliability
Audio-Video Diff Histogram
Histogram — Cross-method agreement distribution
Runtime Comparison
Bar chart — Mean runtime by method
Error Distribution Boxplot
Boxplot + scatter — Error distribution by method and offset
Resource Usage
Dual bar chart — Peak CPU and memory
Motion Before/After
Per-case overlay — Motion signals pre/post alignment
Sync Timelines
Per-case diagram — Timeline bars with offset arrows
Plot Style
All plots use a custom matplotlib style for consistency:visualize_results.py:54-71
Colors:
- Audio:
#2196F3(Material Blue) - Visual:
#FF9800(Material Orange) - Neutral:
#7E57C2(Material Purple for cross-method plots)
1. Error vs Offset
File:error_vs_offset.png
Description: Grouped bar chart showing MAE for each true offset magnitude, split by method.
Use Cases
- Identify if accuracy degrades at extreme offsets (e.g., ±1000 ms)
- Compare audio vs visual performance across offset ranges
- Detect asymmetry (e.g., better performance on positive vs negative offsets)
Implementation
visualize_results.py:78-114
Interpretation
- Ideal Pattern
- Edge Degradation
- Asymmetry
MAE is roughly constant across all offsets → method is robust to offset magnitude.
2. Confidence vs Error
File:confidence_vs_error.png
Description: Scatter plot of confidence score vs absolute error with linear regression lines.
Use Cases
- Validate that confidence scores reliably predict error magnitude
- Identify outliers (high confidence but high error, or vice versa)
- Compare confidence calibration between audio and visual methods
Implementation
visualize_results.py:121-167
Interpretation
3. Audio-Video Diff Histogram
File:audio_video_diff_histogram.png
Description: Histogram of |audio_estimate - visual_estimate| across all test cases.
Use Cases
- Visualize cross-method agreement distribution
- Identify if disagreement is centered around a systematic bias or scattered
- Assess feasibility of hybrid strategies (e.g., average both estimates if diff < threshold)
Implementation
visualize_results.py:174-212
Interpretation
- Tight Distribution (σ < 30 ms)
- Wide Distribution (σ > 50 ms)
- Bimodal Distribution
Strong agreement → both methods likely correct → safe to average estimates or use either method
4. Runtime Comparison
File:runtime_comparison.png
Description: Bar chart of mean runtime per method with standard deviation error bars.
Use Cases
- Compare efficiency between audio and visual methods
- Estimate total pipeline execution time
- Identify if runtime variance is high (may indicate video-dependent bottlenecks)
Implementation
visualize_results.py:219-260
Interpretation
Typical Results:
- Audio: 2-5 seconds (FFmpeg extraction + GCC-PHAT)
- Visual: 3-10 seconds (frame extraction + motion correlation)
5. Error Distribution Boxplot
File:error_distribution_boxplot.png
Description: Side-by-side boxplots of absolute error grouped by true offset and method, with overlaid scatter points.
Use Cases
- Visualize error distribution shape (median, quartiles, outliers)
- Compare variability between methods
- Identify offset-specific failure modes
Implementation
visualize_results.py:267-348
Interpretation
- Narrow Box (IQR < 10 ms)
- Wide Box (IQR > 30 ms)
- Median != Mean (skewed distribution)
Low variance → consistent performance → method is stable
6. Resource Usage
File:resource_usage.png
Description: Dual bar chart (side-by-side) showing peak CPU% and peak memory (MB) by method.
Use Cases
- Ensure pipeline fits within system constraints
- Identify resource bottlenecks (CPU-bound vs memory-bound)
- Compare resource efficiency between methods
Implementation
visualize_results.py:355-407
Interpretation
7. Motion Before/After Overlay
Files:before_after/*.png (one per test case)
Description: Two-panel plot showing original vs synthetic motion signals before alignment (top) and after applying the estimated offset (bottom).
Use Cases
- Visually validate that alignment improves signal overlap
- Debug cases where visual sync fails (e.g., periodic motion, low signal-to-noise ratio)
- Generate figures for publications or presentations
Implementation
visualize_results.py:414-484
Interpretation
- Good Alignment
- Partial Alignment
- No Alignment
After plot shows strong peak overlap → visual sync correctly identified the offset
Requires Diagnostics: This plot requires
.npz files generated by run_batch.py. If missing, re-run batch synchronization with the latest version.8. Sync Timelines
Files:timelines/*.png (one per test case)
Description: Timeline diagram showing original and synthetic video bars, with arrows indicating true offset and per-method estimated offsets.
Use Cases
- Visualize the temporal relationship between original and synthetic videos
- Compare how audio and visual methods estimated the offset
- Annotate pad vs trim operations for clarity
Implementation
visualize_results.py:491-597
Diagram Components
Gray Bars
Horizontal bars represent video duration. Original is always at
y=1.0, synthetic at y=0.0.Red Arrow (Solid)
True offset — shows ground truth shift applied during offset generation.
- Rightward arrow: Positive offset (padding)
- Leftward arrow: Negative offset (trimming)
Interpretation
- Perfect Estimate
- Systematic Bias
- Method Disagreement
Dashed arrows overlap with solid red arrow → method correctly estimated the offset
Output Directory Structure
All plots are saved at 300 DPI for print-quality output. Total disk usage: ~10-20 MB for 24 test cases.
Customization
Changing Colors
visualize_results.py:44
Changing DPI
visualize_results.py:45
Changing Plot Size
visualize_results.py:46-47
Disabling Specific Plots
Comment out the corresponding function call ingenerate_plots():
visualize_results.py:618-625
Next Steps
Metrics Reference
Understand how metrics are computed from results.csv
Workflow Guide
Return to the step-by-step pipeline instructions