Introduction

What is Multi-Camera Video Synchronization?

A Flask-based tool for aligning multiple video tracks using visual motion detection or audio cross-correlation. Designed for synchronizing multi-view recordings where start times aren’t perfectly aligned. Whether you’re working with silent videos, noisy environments, or recordings from different camera angles, this tool provides robust synchronization with sub-frame accuracy.

Key Features

Dual Sync Methods

Choose between Visual (Motion) synchronization for silent videos or Audio (GCC-PHAT) for high-precision alignment using sound

Flask Web UI

Intuitive multi-step wizard with upload, sync, review, and export capabilities - no command line required

Sub-Frame Accuracy

Uses FFmpeg re-encoding with tpad and adelay filters to ensure precise alignment across all video players

Global Optimization

Computes pairwise offsets between all videos and uses weighted least-squares optimization for globally consistent alignment

Evaluation Suite

Fully script-driven pipeline for assessing accuracy, confidence reliability, and efficiency with publication-ready plots

Silent Video Support

Visual motion synchronization works even when videos have no audio or in noisy environments

How It Works

Visual Sync
Audio Sync

Motion-Based Synchronization

Extract Motion Energy

Process each video frame to detect motion patterns using optical flow analysis

# From src/visual_sync.py:39-94
def extract_motion_energy(video_path: str, 
                          downsample: int = 4,
                          blur_size: int = 5,
                          center_crop: bool = True,
                          step: int = 3) -> Tuple[np.ndarray, float]:
    cap = cv2.VideoCapture(video_path)
    fps = cap.get(cv2.CAP_PROP_FPS)
    
    while True:
        ret, frame = cap.read()
        if not ret:
            break
            
        # Center crop and downsample for speed
        if center_crop:
            h, w = frame.shape[:2]
            frame = frame[int(h*0.25):int(h*0.75), int(w*0.25):int(w*0.75)]
        
        gray = cv2.cvtColor(small_frame, cv2.COLOR_BGR2GRAY)
        
        if prev_gray is not None:
            diff = cv2.absdiff(gray, prev_gray)
            energy = np.sum(thresh) / (thresh.shape[0] * thresh.shape[1] * 255)
            motion_energy.append(energy)

Cross-Correlate Motion Signals

Find temporal offsets by correlating motion timeseries across all video pairs

# From src/visual_sync.py:105-133
def correlate_motion_signals(sig1: np.ndarray, sig2: np.ndarray,
                             fps: float,
                             max_offset_sec: float = 20.0) -> Tuple[float, float]:
    # Normalize signals
    sig1_norm = (sig1 - np.mean(sig1)) / (np.std(sig1) + 1e-10)
    sig2_norm = (sig2 - np.mean(sig2)) / (np.std(sig2) + 1e-10)
    
    cc = correlate(sig1_norm, sig2_norm, mode='full')
    
    # Find peak within search window
    lag_frames = lag_idx - center
    offset_seconds = lag_frames / fps
    
    # Confidence scoring
    confidence = (peak - mean_cc) / (std_cc + 1e-10)
    return offset_seconds, confidence

Global Optimization

Use least-squares optimization to find globally consistent offsets across all videos

Visual sync works even when cameras have different angles - the timing of motion events (walking, gestures) remains consistent across views.

GCC-PHAT Cross-Correlation

Extract Audio Tracks

Use FFmpeg to extract audio from each video file

ffmpeg -i video.mp4 -vn -acodec pcm_s16le -ar 44100 -ac 1 audio.wav

Compute GCC-PHAT

Apply Generalized Cross-Correlation with Phase Transform for high-precision alignment

# From src/audio_sync.py:27-118
def compute_gcc_phat(sig_a: np.ndarray, sig_b: np.ndarray, fs: int, 
                     max_offset_sec: float = 10.0) -> Tuple[float, float]:
    # Bandpass filter (300-5000 Hz)
    sos = butter(4, [300, 5000], btype='bandpass', fs=fs, output='sos')
    sig_a = sosfilt(sos, sig_a)
    sig_b = sosfilt(sos, sig_b)
    
    # FFT and cross-correlation
    A = fft(a, n=n)
    B = fft(b, n=n)
    R = A * np.conj(B)
    
    # Phase transform
    R_phat = R / np.abs(R)
    cc = np.real(ifft(R_phat))
    
    # Sub-sample precision via parabolic interpolation
    if 0 < lag_idx < len(cc) - 1:
        y1, y2, y3 = cc[lag_idx-1], cc[lag_idx], cc[lag_idx+1]
        delta = 0.5 * (y3 - y1) / (2*y2 - y1 - y3)
        offset_seconds += delta / float(fs)
    
    return offset_seconds, confidence

Pairwise Alignment

Compute offsets between all pairs of videos for robust synchronization

# From src/audio_sync.py:120-197
pairwise = compute_pairwise_offsets(
    audio_dir, 
    max_offset_sec=10.0,
    window_sec=30.0,
    min_confidence=0.2
)
# Returns: Dict[(fileA, fileB)] -> (offset, confidence)

Global Optimization

Solve weighted least-squares problem to find globally consistent offsets

# From src/audio_sync.py:199-239
def optimize_offsets(pairwise, wavs):
    # Minimize: Σ w_AB * (offset_B - offset_A - d_AB)²
    result = least_squares(residuals, x0, loss='soft_l1')
    offsets_opt = result.x - result.x[0]  # Anchor first file to 0
    return {w: float(offsets_opt[i]) for i, w in enumerate(wavs)}

Audio sync requires all videos to have audio tracks. For silent videos, use Visual (Motion) sync instead.

Technical Architecture

Quick Links

Installation

Set up Python environment and install dependencies

Quick Start

Get synchronized videos in under 5 minutes

Configuration

Customize sync method and processing parameters

Use Cases

Multi-View Sports Recording

Synchronize footage from multiple cameras capturing a sporting event from different angles. Visual sync works even when crowd noise varies between camera positions.

Interview & Podcast Production

Align recordings from multiple cameras and microphones using audio cross-correlation for frame-perfect editing.

Dance & Performance

Synchronize silent or music-backed performances across multiple camera angles using motion-based alignment.

Research & Analysis

Academic research requiring precise temporal alignment of multi-camera recordings with sub-frame accuracy.

System Requirements

Python

Python 3.11+ (tested on 3.12)Required by scipy and pandas

FFmpeg

FFmpeg in system PATHRequired for audio extraction and video manipulation

Ready to get started? Head to the Installation Guide to set up your environment.

Get Started

Core Concepts

User Guide

Evaluation Suite

What is Multi-Camera Video Synchronization?

Key Features

Dual Sync Methods

Flask Web UI

Sub-Frame Accuracy

Global Optimization

Evaluation Suite

Silent Video Support

How It Works

Technical Architecture

Quick Links

Installation

Quick Start

Configuration

Use Cases

System Requirements

Python

FFmpeg

Build docs developers (and LLMs) love

Get Started

Core Concepts

User Guide

Evaluation Suite

​What is Multi-Camera Video Synchronization?

​Key Features

Dual Sync Methods

Flask Web UI

Sub-Frame Accuracy

Global Optimization

Evaluation Suite

Silent Video Support

​How It Works

​Technical Architecture

​Quick Links

Installation

Quick Start

Configuration

​Use Cases

​System Requirements

Python

FFmpeg

Build docs developers (and LLMs) love

What is Multi-Camera Video Synchronization?

Key Features

How It Works

Technical Architecture

Quick Links

Use Cases

System Requirements