Offset Semantics

Overview

After computing synchronization offsets, the system applies them to video files using FFmpeg. Understanding the sign convention is critical for interpreting results and debugging issues.

Key Principle: Offsets represent how much to shift each video’s timeline to align all videos to a common reference time.

Sign Convention

The system uses Option C semantics from the source code:

Positive Offset (+)

Delay the entire fileVideo and audio start later in the synchronized timeline. Black frames and silence are added at the beginning.

Negative Offset (-)

Trim from the startThe first |offset| seconds are removed. The file starts later in its own content.

Visual Example

Positive Offset (+2.5s)
Negative Offset (-1.8s)
Zero Offset (0.0s)

Original Timeline:
[═══════════════════════════════] video.mp4
 0s                          10s

After +2.5s offset (DELAY):
[BLACK][═════════════════════════════════] video_synced.mp4
 0s   2.5s                              12.5s

↑ 2.5s of black frames/silence added

Interpretation: This video started 2.5 seconds late relative to the reference. We pad the beginning to align it.

Original Timeline:
[═══════════════════════════════] video.mp4
 0s                          10s

After -1.8s offset (TRIM):
      [═══════════════════════] video_synced.mp4
     1.8s                   10s → 8.2s duration

↑ First 1.8s removed

Interpretation: This video started 1.8 seconds early relative to the reference. We trim the beginning to align it.

Original Timeline:
[═══════════════════════════════] video.mp4(reference)
 0s                          10s

After 0.0s offset (NO CHANGE):
[═══════════════════════════════] video_synced.mp4
 0s                          10s

↑ This is the reference video (first file)

The first video in alphabetical order is always anchored to 0.0s. All other offsets are relative to it.

Implementation Details

The offset application logic is in src/video_sync.py:

Positive Offset: Delay

# video_sync.py:110-173
def _reencode_delay(input_path, output_path, offset, end_pad=0, has_audio=True):
    """
    Re-encode and add black frames/silence at the start.
    Uses tpad (video) and adelay (audio) filters.
    """
    ms = int(offset * 1000)
    
    # Video: tpad adds black frames at start
    tpad_str = f"[0:v]tpad=start_duration={offset}:color=black[v]"
    
    # Audio: adelay shifts audio by offset milliseconds
    if has_audio:
        audio_str = f"[0:a]adelay={ms}|{ms}[a]"  # Stereo: delay both channels
    
    cmd = [
        "ffmpeg", "-y",
        "-i", input_path,
        "-filter_complex", f"{tpad_str};{audio_str}",
        "-map", "[v]", "-map", "[a]",
        "-c:v", "libx264", "-preset", "ultrafast", "-crf", "23",
        "-c:a", "aac", "-b:a", "128k",
        output_path
    ]

Why re-encode for delays?

FFmpeg’s -itsoffset flag can delay streams without re-encoding, but it has compatibility issues:

Some players ignore PTS (presentation timestamp) shifts
OpenCV’s cv2.VideoCapture doesn’t respect -itsoffset
Frame-accurate seeking may fail

Solution: Use tpad to insert actual black frames at the beginning. This ensures all players and tools see the delay.

Negative Offset: Trim

# video_sync.py:99-107
def _trim_stream_copy(input_path, output_path, trim):
    """
    Fast trim using -ss before -i (stream copy, no re-encode).
    """
    return [
        "ffmpeg", "-y",
        "-ss", str(trim),  # Seek to trim point
        "-i", input_path,
        "-c", "copy",      # Stream copy (fast)
        output_path
    ]

Fast Trim: Using -ss before -i enables fast seeking. FFmpeg jumps to the nearest keyframe and stream-copies from there. No re-encoding needed.

If stream copy fails (rare codec/container issues), fallback to re-encode:

# video_sync.py:176-189
def _reencode_trim(input_path, output_path, trim, has_audio=True):
    cmd = [
        "ffmpeg", "-y",
        "-ss", str(trim),
        "-i", input_path,
        "-c:v", "libx264", "-preset", "ultrafast", "-crf", "23"
    ]
    if has_audio:
        cmd.extend(["-c:a", "aac", "-b:a", "128k"])
    cmd.append(output_path)
    return cmd

Duration Equalization

After applying offsets, videos may have different final durations:

Video A: 10s + 2.5s delay = 12.5s
Video B: 10s + 0.0s delay = 10.0s
Video C: 10s - 1.8s trim  = 8.2s

The system pads all videos to the maximum duration for consistent playback:

# video_sync.py:251-254
max_duration = max(final_durations.values())
end_pad = max(0, max_duration - final_durations[fname])

End Padding Example

Max duration = 12.5s (from Video A)

Video A: 12.5s → no padding needed
Video B: 10.0s → add 2.5s black at end
Video C:  8.2s → add 4.3s black at end

Final (all 12.5s):
A: [BLACK 2.5s][═══ content 10s ═══]
B: [═══ content 10s ═══][BLACK 2.5s]
C: [═══ content 8.2s ═══][BLACK 4.3s]

Why equalize? Most video players and editors expect all clips to have the same duration when previewing side-by-side. This prevents playback controls from becoming unsynchronized.

End Padding Implementation

# video_sync.py:118-122 (in _reencode_delay)
if end_pad > 0:
    tpad_params.append(f"stop_duration={end_pad}")

# Example: tpad=start_duration=2.5:stop_duration=1.3:color=black

Sign Derivation from Sync Methods

Both sync methods return offsets with specific sign conventions:

Audio Sync (GCC-PHAT)

# audio_sync.py:118
def compute_gcc_phat(sig_a, sig_b, fs, ...) -> Tuple[float, float]:
    # ... correlation computation ...
    lag = lags[lag_idx]  # Positive if sig_b lags sig_a
    offset_seconds = lag / float(fs)
    
    return -offset_seconds, confidence  # ← Note the negative sign

Why the negative sign in GCC-PHAT?

The raw correlation lag represents:

lag > 0: sig_b is delayed relative to sig_a
lag < 0: sig_b is ahead of sig_a

But we want offsets to apply to sig_b’s timeline:

If sig_b is delayed (lag > 0), we need to add delay → positive offset
If sig_b is ahead (lag < 0), we need to trim → negative offset

However, the correlation measures “how much to shift sig_b to align with sig_a”, so:

offset = -lag

This ensures:

offset > 0 → sig_b started late → add delay
offset < 0 → sig_b started early → trim

Visual Sync (Motion)

# visual_sync.py:123-125
def correlate_motion_signals(sig1, sig2, fps, ...) -> Tuple[float, float]:
    # ... correlation ...
    lag_frames = lag_idx - center
    offset_seconds = lag_frames / fps
    
    return offset_seconds, confidence  # Direct offset (no sign flip)

Visual sync directly returns the offset without sign inversion.

Verification

After synchronization, verify offsets were applied correctly:

Check Filename Suffix

Synchronized videos have _synced suffix:

video1.mp4 → video1_synced.mp4
video2.mp4 → video2_synced.mp4

Verify Duration

All synchronized videos should have identical duration:

ffprobe -v error -show_entries format=duration \
  -of default=noprint_wrappers=1:nokey=1 video1_synced.mp4
# Output: 12.500000

ffprobe -v error -show_entries format=duration \
  -of default=noprint_wrappers=1:nokey=1 video2_synced.mp4
# Output: 12.500000  ✓ Same duration

Visual Inspection

Use the built-in multi-video player:

All videos should start simultaneously
Shared events (clap, speech) should align across views
No drift over time (constant offset)

Check for Black Frames

Videos with positive offsets should have black frames at the start:

ffmpeg -i video_synced.mp4 -vf "select=eq(n\,0)" -vframes 1 frame0.png
# If offset was positive, frame0.png should be solid black

Common Misunderstandings

Misconception: “Positive offset means the video is ahead.”Reality: Positive offset means the video started late and needs delay added to align it with the reference.

Misconception: “The offset is added to the video’s timestamp.”Reality: The offset modifies the video content:

Positive: Inserts black frames/silence at start
Negative: Removes content from start

Example Scenario

Three cameras recording a presentation:

Actual Recording Start Times:
- Camera A: 10:00:00.000 (reference)
- Camera B: 10:00:02.500 (started 2.5s late)
- Camera C: 09:59:58.200 (started 1.8s early)

Computed Offsets:
- Camera A: 0.0s    (reference)
- Camera B: +2.5s   (needs delay)
- Camera C: -1.8s   (needs trim)

Synchronized Result:
- All videos start at the same effective time
- A: [═══ content ═══]
- B: [BLACK 2.5s][═══ content ═══]
- C: [trimmed 1.8s][═══ content ═══]

Log Output

INFO: Processing camera_a.mp4 with offset +0.000s
INFO: Processing camera_b.mp4 with offset +2.500s
INFO: Processing camera_c.mp4 with offset -1.800s

INFO: Max final duration: 12.500s
INFO: Synchronizing videos...

FFmpeg Command Examples

# Add 2.5s black frames and silent audio at start
ffmpeg -y -i camera_b.mp4 \
  -filter_complex \
    "[0:v]tpad=start_duration=2.5:color=black[v]; \
     [0:a]adelay=2500|2500[a]" \
  -map "[v]" -map "[a]" \
  -c:v libx264 -preset ultrafast -crf 23 \
  -c:a aac -b:a 128k \
  camera_b_synced.mp4

Silent Videos

Videos without audio tracks are handled differently:

# video_sync.py:238-241
has_audio = _has_audio_stream(in_path)
if not has_audio:
    logger.debug("%s detected as silent (no audio stream)", fname)

For silent videos:

Positive offset: Only tpad applied (no adelay)
Negative offset: Trim works normally
No audio mapping in FFmpeg commands

# Silent video with +2.0s offset
ffmpeg -y -i silent.mp4 \
  -filter_complex "[0:v]tpad=start_duration=2.0:color=black[v]" \
  -map "[v]" \
  -c:v libx264 -preset ultrafast -crf 23 \
  silent_synced.mp4

Troubleshooting

Videos not aligned after sync

Check:

Verify offsets in logs: INFO: Processing X with offset Y
Ensure all _synced.mp4 files have same duration
Check first frame of positive-offset videos (should be black)
Use ffprobe to verify PTS start times

Common cause: Player not respecting PTS (use VLC or mpv)

Black frames at wrong position

Symptom: Black frames in middle or end instead of startCause: FFmpeg command error or codec issueSolution: Check logs for FFmpeg errors, try re-running sync

Audio/video desync within a file

Symptom: Video and audio drift apart in synced fileCause:

Original file had A/V sync issues
Variable framerate (VFR) video

Solution:

Fix original video first: ffmpeg -i input.mp4 -c:v libx264 -r 30 -c:a aac fixed.mp4
Use constant framerate (CFR) recordings

Source Code Reference

Offset application logic in src/video_sync.py:

Line 1-12: Offset semantics documentation
Line 77-96: _delay_stream_copy() - Delay using -itsoffset (stream copy)
Line 99-107: _trim_stream_copy() - Trim using -ss (stream copy)
Line 110-173: _reencode_delay() - Delay using tpad/adelay (re-encode)
Line 176-189: _reencode_trim() - Trim with re-encode (fallback)
Line 192-363: apply_video_offsets() - Main offset application function

Get Started

Core Concepts

User Guide

Evaluation Suite

Overview

Sign Convention

Positive Offset (+)

Negative Offset (-)

Visual Example

Implementation Details

Positive Offset: Delay

Negative Offset: Trim

Duration Equalization

End Padding Example

End Padding Implementation

Sign Derivation from Sync Methods

Audio Sync (GCC-PHAT)

Visual Sync (Motion)

Verification

Common Misunderstandings

Example Scenario

Log Output

FFmpeg Command Examples

Silent Videos

Troubleshooting

Source Code Reference

Next Steps

Audio Sync

Visual Sync

Build docs developers (and LLMs) love

Get Started

Core Concepts

User Guide

Evaluation Suite

​Overview

​Sign Convention

Positive Offset (+)

Negative Offset (-)

​Visual Example

​Implementation Details

​Positive Offset: Delay

​Negative Offset: Trim

​Duration Equalization

​End Padding Example

​End Padding Implementation

​Sign Derivation from Sync Methods

​Audio Sync (GCC-PHAT)

​Visual Sync (Motion)

​Verification

​Common Misunderstandings

​Example Scenario

​Log Output

​FFmpeg Command Examples

​Silent Videos

​Troubleshooting

​Source Code Reference

​Next Steps

Audio Sync

Visual Sync

Build docs developers (and LLMs) love

Overview

Sign Convention

Visual Example

Implementation Details

Positive Offset: Delay

Negative Offset: Trim

Duration Equalization

End Padding Example

End Padding Implementation

Sign Derivation from Sync Methods

Audio Sync (GCC-PHAT)

Visual Sync (Motion)

Verification

Common Misunderstandings

Example Scenario

Log Output

FFmpeg Command Examples

Silent Videos

Troubleshooting

Source Code Reference

Next Steps