Skip to main content

Overview

After computing synchronization offsets, the system applies them to video files using FFmpeg. Understanding the sign convention is critical for interpreting results and debugging issues.
Key Principle: Offsets represent how much to shift each video’s timeline to align all videos to a common reference time.

Sign Convention

The system uses Option C semantics from the source code:

Positive Offset (+)

Delay the entire fileVideo and audio start later in the synchronized timeline. Black frames and silence are added at the beginning.

Negative Offset (-)

Trim from the startThe first |offset| seconds are removed. The file starts later in its own content.

Visual Example

Original Timeline:
[═══════════════════════════════] video.mp4
 0s                          10s

After +2.5s offset (DELAY):
[BLACK][═════════════════════════════════] video_synced.mp4
 0s   2.5s                              12.5s

↑ 2.5s of black frames/silence added
Interpretation: This video started 2.5 seconds late relative to the reference. We pad the beginning to align it.

Implementation Details

The offset application logic is in src/video_sync.py:

Positive Offset: Delay

# video_sync.py:110-173
def _reencode_delay(input_path, output_path, offset, end_pad=0, has_audio=True):
    """
    Re-encode and add black frames/silence at the start.
    Uses tpad (video) and adelay (audio) filters.
    """
    ms = int(offset * 1000)
    
    # Video: tpad adds black frames at start
    tpad_str = f"[0:v]tpad=start_duration={offset}:color=black[v]"
    
    # Audio: adelay shifts audio by offset milliseconds
    if has_audio:
        audio_str = f"[0:a]adelay={ms}|{ms}[a]"  # Stereo: delay both channels
    
    cmd = [
        "ffmpeg", "-y",
        "-i", input_path,
        "-filter_complex", f"{tpad_str};{audio_str}",
        "-map", "[v]", "-map", "[a]",
        "-c:v", "libx264", "-preset", "ultrafast", "-crf", "23",
        "-c:a", "aac", "-b:a", "128k",
        output_path
    ]
FFmpeg’s -itsoffset flag can delay streams without re-encoding, but it has compatibility issues:
  • Some players ignore PTS (presentation timestamp) shifts
  • OpenCV’s cv2.VideoCapture doesn’t respect -itsoffset
  • Frame-accurate seeking may fail
Solution: Use tpad to insert actual black frames at the beginning. This ensures all players and tools see the delay.

Negative Offset: Trim

# video_sync.py:99-107
def _trim_stream_copy(input_path, output_path, trim):
    """
    Fast trim using -ss before -i (stream copy, no re-encode).
    """
    return [
        "ffmpeg", "-y",
        "-ss", str(trim),  # Seek to trim point
        "-i", input_path,
        "-c", "copy",      # Stream copy (fast)
        output_path
    ]
Fast Trim: Using -ss before -i enables fast seeking. FFmpeg jumps to the nearest keyframe and stream-copies from there. No re-encoding needed.
If stream copy fails (rare codec/container issues), fallback to re-encode:
# video_sync.py:176-189
def _reencode_trim(input_path, output_path, trim, has_audio=True):
    cmd = [
        "ffmpeg", "-y",
        "-ss", str(trim),
        "-i", input_path,
        "-c:v", "libx264", "-preset", "ultrafast", "-crf", "23"
    ]
    if has_audio:
        cmd.extend(["-c:a", "aac", "-b:a", "128k"])
    cmd.append(output_path)
    return cmd

Duration Equalization

After applying offsets, videos may have different final durations:
Video A: 10s + 2.5s delay = 12.5s
Video B: 10s + 0.0s delay = 10.0s
Video C: 10s - 1.8s trim  = 8.2s
The system pads all videos to the maximum duration for consistent playback:
# video_sync.py:251-254
max_duration = max(final_durations.values())
end_pad = max(0, max_duration - final_durations[fname])

End Padding Example

Max duration = 12.5s (from Video A)

Video A: 12.5s → no padding needed
Video B: 10.0s → add 2.5s black at end
Video C:  8.2s → add 4.3s black at end

Final (all 12.5s):
A: [BLACK 2.5s][═══ content 10s ═══]
B: [═══ content 10s ═══][BLACK 2.5s]
C: [═══ content 8.2s ═══][BLACK 4.3s]
Why equalize? Most video players and editors expect all clips to have the same duration when previewing side-by-side. This prevents playback controls from becoming unsynchronized.

End Padding Implementation

# video_sync.py:118-122 (in _reencode_delay)
if end_pad > 0:
    tpad_params.append(f"stop_duration={end_pad}")

# Example: tpad=start_duration=2.5:stop_duration=1.3:color=black

Sign Derivation from Sync Methods

Both sync methods return offsets with specific sign conventions:

Audio Sync (GCC-PHAT)

# audio_sync.py:118
def compute_gcc_phat(sig_a, sig_b, fs, ...) -> Tuple[float, float]:
    # ... correlation computation ...
    lag = lags[lag_idx]  # Positive if sig_b lags sig_a
    offset_seconds = lag / float(fs)
    
    return -offset_seconds, confidence  # ← Note the negative sign
The raw correlation lag represents:
lag > 0: sig_b is delayed relative to sig_a
lag < 0: sig_b is ahead of sig_a
But we want offsets to apply to sig_b’s timeline:
  • If sig_b is delayed (lag > 0), we need to add delay → positive offset
  • If sig_b is ahead (lag < 0), we need to trim → negative offset
However, the correlation measures “how much to shift sig_b to align with sig_a”, so:
offset = -lag
This ensures:
  • offset > 0 → sig_b started late → add delay
  • offset < 0 → sig_b started early → trim

Visual Sync (Motion)

# visual_sync.py:123-125
def correlate_motion_signals(sig1, sig2, fps, ...) -> Tuple[float, float]:
    # ... correlation ...
    lag_frames = lag_idx - center
    offset_seconds = lag_frames / fps
    
    return offset_seconds, confidence  # Direct offset (no sign flip)
Visual sync directly returns the offset without sign inversion.

Verification

After synchronization, verify offsets were applied correctly:
1

Check Filename Suffix

Synchronized videos have _synced suffix:
video1.mp4 → video1_synced.mp4
video2.mp4 → video2_synced.mp4
2

Verify Duration

All synchronized videos should have identical duration:
ffprobe -v error -show_entries format=duration \
  -of default=noprint_wrappers=1:nokey=1 video1_synced.mp4
# Output: 12.500000

ffprobe -v error -show_entries format=duration \
  -of default=noprint_wrappers=1:nokey=1 video2_synced.mp4
# Output: 12.500000  ✓ Same duration
3

Visual Inspection

Use the built-in multi-video player:
  • All videos should start simultaneously
  • Shared events (clap, speech) should align across views
  • No drift over time (constant offset)
4

Check for Black Frames

Videos with positive offsets should have black frames at the start:
ffmpeg -i video_synced.mp4 -vf "select=eq(n\,0)" -vframes 1 frame0.png
# If offset was positive, frame0.png should be solid black

Common Misunderstandings

Misconception: “Positive offset means the video is ahead.”Reality: Positive offset means the video started late and needs delay added to align it with the reference.
Misconception: “The offset is added to the video’s timestamp.”Reality: The offset modifies the video content:
  • Positive: Inserts black frames/silence at start
  • Negative: Removes content from start

Example Scenario

Three cameras recording a presentation:
Actual Recording Start Times:
- Camera A: 10:00:00.000 (reference)
- Camera B: 10:00:02.500 (started 2.5s late)
- Camera C: 09:59:58.200 (started 1.8s early)

Computed Offsets:
- Camera A: 0.0s    (reference)
- Camera B: +2.5s   (needs delay)
- Camera C: -1.8s   (needs trim)

Synchronized Result:
- All videos start at the same effective time
- A: [═══ content ═══]
- B: [BLACK 2.5s][═══ content ═══]
- C: [trimmed 1.8s][═══ content ═══]

Log Output

INFO: Processing camera_a.mp4 with offset +0.000s
INFO: Processing camera_b.mp4 with offset +2.500s
INFO: Processing camera_c.mp4 with offset -1.800s

INFO: Max final duration: 12.500s
INFO: Synchronizing videos...

FFmpeg Command Examples

# Add 2.5s black frames and silent audio at start
ffmpeg -y -i camera_b.mp4 \
  -filter_complex \
    "[0:v]tpad=start_duration=2.5:color=black[v]; \
     [0:a]adelay=2500|2500[a]" \
  -map "[v]" -map "[a]" \
  -c:v libx264 -preset ultrafast -crf 23 \
  -c:a aac -b:a 128k \
  camera_b_synced.mp4

Silent Videos

Videos without audio tracks are handled differently:
# video_sync.py:238-241
has_audio = _has_audio_stream(in_path)
if not has_audio:
    logger.debug("%s detected as silent (no audio stream)", fname)
For silent videos:
  • Positive offset: Only tpad applied (no adelay)
  • Negative offset: Trim works normally
  • No audio mapping in FFmpeg commands
# Silent video with +2.0s offset
ffmpeg -y -i silent.mp4 \
  -filter_complex "[0:v]tpad=start_duration=2.0:color=black[v]" \
  -map "[v]" \
  -c:v libx264 -preset ultrafast -crf 23 \
  silent_synced.mp4

Troubleshooting

Check:
  1. Verify offsets in logs: INFO: Processing X with offset Y
  2. Ensure all _synced.mp4 files have same duration
  3. Check first frame of positive-offset videos (should be black)
  4. Use ffprobe to verify PTS start times
Common cause: Player not respecting PTS (use VLC or mpv)
Symptom: Black frames in middle or end instead of startCause: FFmpeg command error or codec issueSolution: Check logs for FFmpeg errors, try re-running sync
Symptom: Video and audio drift apart in synced fileCause:
  • Original file had A/V sync issues
  • Variable framerate (VFR) video
Solution:
  • Fix original video first: ffmpeg -i input.mp4 -c:v libx264 -r 30 -c:a aac fixed.mp4
  • Use constant framerate (CFR) recordings

Source Code Reference

Offset application logic in src/video_sync.py:
  • Line 1-12: Offset semantics documentation
  • Line 77-96: _delay_stream_copy() - Delay using -itsoffset (stream copy)
  • Line 99-107: _trim_stream_copy() - Trim using -ss (stream copy)
  • Line 110-173: _reencode_delay() - Delay using tpad/adelay (re-encode)
  • Line 176-189: _reencode_trim() - Trim with re-encode (fallback)
  • Line 192-363: apply_video_offsets() - Main offset application function

Next Steps

Audio Sync

How offsets are computed from audio

Visual Sync

How offsets are computed from motion

Build docs developers (and LLMs) love