Skip to main content
The audio_sync module provides robust audio-based synchronization using FFT and GCC-PHAT (Generalized Cross-Correlation with Phase Transform) algorithms. It supports both pairwise alignment and global optimization for improved accuracy.

compute_gcc_phat

Compute time offset between two audio signals using GCC-PHAT cross-correlation.
from src.audio_sync import compute_gcc_phat
import numpy as np

# Load your audio signals
sig_a = np.array([...])  # Reference signal
sig_b = np.array([...])  # Signal to align
fs = 48000  # Sample rate

offset, confidence = compute_gcc_phat(sig_a, sig_b, fs, max_offset_sec=10.0)
print(f"Offset: {offset:.3f}s, Confidence: {confidence:.3f}")

Parameters

sig_a
np.ndarray
required
Reference audio signal
sig_b
np.ndarray
required
Audio signal to align with the reference
fs
int
required
Sample rate in Hz
max_offset_sec
float
default:"10.0"
Maximum expected offset between signals in seconds. Constrains the search range for better performance.
window_sec
float | None
default:"None"
If provided, use only the first N seconds of audio for speed. None means use the entire signal.

Returns

offset_seconds
float
Time offset in seconds to add to sig_b timestamps to align with sig_a. Negative values mean sig_b leads sig_a.
confidence_score
float
Confidence score between 0 and 1, where higher values indicate more reliable synchronization. Values below 0.3 are considered low confidence.

compute_pairwise_offsets

Compute time offsets between all pairs of WAV files in a directory.
from src.audio_sync import compute_pairwise_offsets

pairwise = compute_pairwise_offsets(
    audio_dir="./audio_files",
    max_offset_sec=10.0,
    window_sec=30.0,
    min_confidence=0.2
)

# Results: {('file1.wav', 'file2.wav'): (offset, confidence), ...}
for (file_a, file_b), (offset, conf) in pairwise.items():
    print(f"{file_a} <-> {file_b}: {offset:.3f}s (confidence: {conf:.3f})")

Parameters

audio_dir
str
required
Directory containing WAV files to synchronize
max_offset_sec
float
default:"10.0"
Maximum expected offset between any pair of files
window_sec
float | None
default:"30.0"
Use only first N seconds for speed. Set to None to use entire files.
min_confidence
float
default:"0.0"
Skip pairs with confidence scores below this threshold

Returns

pairwise_offsets
Dict[Tuple[str, str], Tuple[float, float]]
Dictionary mapping (fileA, fileB) tuples to (offset_seconds, confidence) tuples. The offset indicates how much to add to fileB to align with fileA.

Raises

  • FileNotFoundError - No WAV files found in the specified directory
  • ValueError - Less than 2 WAV files found (need at least 2 for pairwise sync)

optimize_offsets

Find globally consistent offsets using weighted least-squares optimization.
from src.audio_sync import optimize_offsets

# After computing pairwise offsets
pairwise = {('a.wav', 'b.wav'): (1.5, 0.9), ('b.wav', 'c.wav'): (2.1, 0.85)}
wavs = ['a.wav', 'b.wav', 'c.wav']

optimized = optimize_offsets(pairwise, wavs)
print(optimized)  # {'a.wav': 0.0, 'b.wav': 1.5, 'c.wav': 3.6}

Parameters

pairwise
Dict[Tuple[str, str], Tuple[float, float]]
required
Dictionary of pairwise offsets and confidences from compute_pairwise_offsets
wavs
List[str]
required
List of all WAV filenames to optimize

Returns

optimized_offsets
Dict[str, float]
Dictionary mapping filename to optimized offset in seconds. The first file is anchored at 0.0 as the reference.

Raises

  • ValueError - No pairwise offsets provided

estimate_offsets_robust

Main entry point for robust audio-based synchronization using pairwise alignment and global optimization.
from src.audio_sync import estimate_offsets_robust

# Synchronize all WAV files in a directory
offsets = estimate_offsets_robust(
    audio_dir="./extracted_audio",
    max_offset_sec=10.0,
    window_sec=30.0,
    min_confidence=0.2,
    outlier_threshold=0.5
)

# Use these offsets with apply_video_offsets
for filename, offset in offsets.items():
    print(f"{filename}: {offset:+.3f}s")

Parameters

audio_dir
str
required
Directory containing WAV files extracted from videos
max_offset_sec
float
default:"10.0"
Maximum expected offset between any pair of files
window_sec
float | None
default:"30.0"
Use only first N seconds for speed. None = use entire audio files.
min_confidence
float
default:"0.2"
Skip pairs with confidence below this threshold during pairwise computation
outlier_threshold
float
default:"0.5"
Flag inconsistent pairs with errors above this value (in seconds) after optimization

Returns

offsets
Dict[str, float]
Dictionary mapping filename to offset in seconds. The first file is anchored at 0.0. Add these offsets to each file’s timestamps to align them.

Raises

  • ValueError - No WAV files found in directory, or no valid pairwise offsets found

Algorithm Details

This function implements a three-step process:
  1. Pairwise Alignment: Computes offsets between all pairs of files using GCC-PHAT
  2. Global Optimization: Finds globally consistent offsets using weighted least-squares
  3. Outlier Detection: Identifies and flags inconsistent pairwise measurements
This approach is more robust than single-reference alignment when dealing with degraded audio (clipping, noise, etc.).

Build docs developers (and LLMs) love