Audio Synchronization

The audio_sync module provides robust audio-based synchronization using FFT and GCC-PHAT (Generalized Cross-Correlation with Phase Transform) algorithms. It supports both pairwise alignment and global optimization for improved accuracy.

compute_gcc_phat

Compute time offset between two audio signals using GCC-PHAT cross-correlation.

from src.audio_sync import compute_gcc_phat
import numpy as np

# Load your audio signals
sig_a = np.array([...])  # Reference signal
sig_b = np.array([...])  # Signal to align
fs = 48000  # Sample rate

offset, confidence = compute_gcc_phat(sig_a, sig_b, fs, max_offset_sec=10.0)
print(f"Offset: {offset:.3f}s, Confidence: {confidence:.3f}")

Parameters

sig_a

np.ndarray

required

Reference audio signal

sig_b

np.ndarray

required

Audio signal to align with the reference

int

required

Sample rate in Hz

max_offset_sec

float

default:"10.0"

Maximum expected offset between signals in seconds. Constrains the search range for better performance.

window_sec

float | None

default:"None"

If provided, use only the first N seconds of audio for speed. None means use the entire signal.

Returns

offset_seconds

float

Time offset in seconds to add to sig_b timestamps to align with sig_a. Negative values mean sig_b leads sig_a.

confidence_score

float

Confidence score between 0 and 1, where higher values indicate more reliable synchronization. Values below 0.3 are considered low confidence.

compute_pairwise_offsets

Compute time offsets between all pairs of WAV files in a directory.

from src.audio_sync import compute_pairwise_offsets

pairwise = compute_pairwise_offsets(
    audio_dir="./audio_files",
    max_offset_sec=10.0,
    window_sec=30.0,
    min_confidence=0.2
)

# Results: {('file1.wav', 'file2.wav'): (offset, confidence), ...}
for (file_a, file_b), (offset, conf) in pairwise.items():
    print(f"{file_a} <-> {file_b}: {offset:.3f}s (confidence: {conf:.3f})")

Parameters

audio_dir

str

required

Directory containing WAV files to synchronize

max_offset_sec

float

default:"10.0"

Maximum expected offset between any pair of files

window_sec

float | None

default:"30.0"

Use only first N seconds for speed. Set to None to use entire files.

min_confidence

float

default:"0.0"

Skip pairs with confidence scores below this threshold

Returns

pairwise_offsets

Dict[Tuple[str, str], Tuple[float, float]]

Dictionary mapping (fileA, fileB) tuples to (offset_seconds, confidence) tuples. The offset indicates how much to add to fileB to align with fileA.

Raises

FileNotFoundError - No WAV files found in the specified directory
ValueError - Less than 2 WAV files found (need at least 2 for pairwise sync)

optimize_offsets

Find globally consistent offsets using weighted least-squares optimization.

from src.audio_sync import optimize_offsets

# After computing pairwise offsets
pairwise = {('a.wav', 'b.wav'): (1.5, 0.9), ('b.wav', 'c.wav'): (2.1, 0.85)}
wavs = ['a.wav', 'b.wav', 'c.wav']

optimized = optimize_offsets(pairwise, wavs)
print(optimized)  # {'a.wav': 0.0, 'b.wav': 1.5, 'c.wav': 3.6}

Parameters

pairwise

Dict[Tuple[str, str], Tuple[float, float]]

required

Dictionary of pairwise offsets and confidences from compute_pairwise_offsets

wavs

List[str]

required

List of all WAV filenames to optimize

Returns

optimized_offsets

Dict[str, float]

Dictionary mapping filename to optimized offset in seconds. The first file is anchored at 0.0 as the reference.

Raises

ValueError - No pairwise offsets provided

estimate_offsets_robust

Main entry point for robust audio-based synchronization using pairwise alignment and global optimization.

from src.audio_sync import estimate_offsets_robust

# Synchronize all WAV files in a directory
offsets = estimate_offsets_robust(
    audio_dir="./extracted_audio",
    max_offset_sec=10.0,
    window_sec=30.0,
    min_confidence=0.2,
    outlier_threshold=0.5
)

# Use these offsets with apply_video_offsets
for filename, offset in offsets.items():
    print(f"{filename}: {offset:+.3f}s")

Parameters

audio_dir

str

required

Directory containing WAV files extracted from videos

max_offset_sec

float

default:"10.0"

Maximum expected offset between any pair of files

window_sec

float | None

default:"30.0"

Use only first N seconds for speed. None = use entire audio files.

min_confidence

float

default:"0.2"

Skip pairs with confidence below this threshold during pairwise computation

outlier_threshold

float

default:"0.5"

Flag inconsistent pairs with errors above this value (in seconds) after optimization

Returns

offsets

Dict[str, float]

Dictionary mapping filename to offset in seconds. The first file is anchored at 0.0. Add these offsets to each file’s timestamps to align them.

Raises

ValueError - No WAV files found in directory, or no valid pairwise offsets found

Algorithm Details

This function implements a three-step process:

Pairwise Alignment: Computes offsets between all pairs of files using GCC-PHAT
Global Optimization: Finds globally consistent offsets using weighted least-squares
Outlier Detection: Identifies and flags inconsistent pairwise measurements

This approach is more robust than single-reference alignment when dealing with degraded audio (clipping, noise, etc.).

Core Modules

Utilities

Evaluation

compute_gcc_phat

Parameters

Returns

compute_pairwise_offsets

Parameters

Returns

Raises

optimize_offsets

Parameters

Returns

Raises

estimate_offsets_robust

Parameters

Returns

Raises

Algorithm Details

Build docs developers (and LLMs) love

Core Modules

Utilities

Evaluation

​compute_gcc_phat

​Parameters

​Returns

​compute_pairwise_offsets

​Parameters

​Returns

​Raises

​optimize_offsets

​Parameters

​Returns

​Raises

​estimate_offsets_robust

​Parameters

​Returns

​Raises

​Algorithm Details

Build docs developers (and LLMs) love

compute_gcc_phat

Parameters

Returns

compute_pairwise_offsets

Parameters

Returns

Raises

optimize_offsets

Parameters

Returns

Raises

estimate_offsets_robust

Parameters

Returns

Raises

Algorithm Details