audio_sync module provides robust audio-based synchronization using FFT and GCC-PHAT (Generalized Cross-Correlation with Phase Transform) algorithms. It supports both pairwise alignment and global optimization for improved accuracy.
compute_gcc_phat
Compute time offset between two audio signals using GCC-PHAT cross-correlation.Parameters
Reference audio signal
Audio signal to align with the reference
Sample rate in Hz
Maximum expected offset between signals in seconds. Constrains the search range for better performance.
If provided, use only the first N seconds of audio for speed. None means use the entire signal.
Returns
Time offset in seconds to add to
sig_b timestamps to align with sig_a. Negative values mean sig_b leads sig_a.Confidence score between 0 and 1, where higher values indicate more reliable synchronization. Values below 0.3 are considered low confidence.
compute_pairwise_offsets
Compute time offsets between all pairs of WAV files in a directory.Parameters
Directory containing WAV files to synchronize
Maximum expected offset between any pair of files
Use only first N seconds for speed. Set to None to use entire files.
Skip pairs with confidence scores below this threshold
Returns
Dictionary mapping
(fileA, fileB) tuples to (offset_seconds, confidence) tuples. The offset indicates how much to add to fileB to align with fileA.Raises
FileNotFoundError- No WAV files found in the specified directoryValueError- Less than 2 WAV files found (need at least 2 for pairwise sync)
optimize_offsets
Find globally consistent offsets using weighted least-squares optimization.Parameters
Dictionary of pairwise offsets and confidences from
compute_pairwise_offsetsList of all WAV filenames to optimize
Returns
Dictionary mapping filename to optimized offset in seconds. The first file is anchored at 0.0 as the reference.
Raises
ValueError- No pairwise offsets provided
estimate_offsets_robust
Main entry point for robust audio-based synchronization using pairwise alignment and global optimization.Parameters
Directory containing WAV files extracted from videos
Maximum expected offset between any pair of files
Use only first N seconds for speed. None = use entire audio files.
Skip pairs with confidence below this threshold during pairwise computation
Flag inconsistent pairs with errors above this value (in seconds) after optimization
Returns
Dictionary mapping filename to offset in seconds. The first file is anchored at 0.0. Add these offsets to each file’s timestamps to align them.
Raises
ValueError- No WAV files found in directory, or no valid pairwise offsets found
Algorithm Details
This function implements a three-step process:- Pairwise Alignment: Computes offsets between all pairs of files using GCC-PHAT
- Global Optimization: Finds globally consistent offsets using weighted least-squares
- Outlier Detection: Identifies and flags inconsistent pairwise measurements