Skip to main content

Overview

The offset_generation module creates synthetic test cases by applying time offsets to original videos. It supports both positive offsets (prepending black frames and silent audio) and negative offsets (trimming from the start).

Functions

generate_synthetic_dataset

generate_synthetic_dataset(
    originals_dir: str = ORIGINALS_DIR,
    synthetic_dir: str = SYNTHETIC_DIR,
    metadata_dir: str = METADATA_DIR,
    offsets_ms: list = None,
) -> str
Generate synthetic offset-shifted videos and write metadata CSV.
originals_dir
str
default:"evaluation/originals"
Directory containing original source videos.
synthetic_dir
str
default:"evaluation/synthetic"
Output directory for shifted videos.
metadata_dir
str
default:"evaluation/metadata"
Output directory for the metadata CSV.
offsets_ms
list[int]
default:"[-1000, -500, -100, 100, 500, 1000]"
List of offsets in milliseconds. Positive values prepend black frames, negative values trim from the start.
Returns: Path to the generated metadata CSV file. Output Files:
  • Shifted video files in synthetic_dir/
  • Metadata CSV in metadata_dir/synthetic_metadata.csv with columns:
    • video_id: Base name of the original video
    • synthetic_file_path: Full path to the generated shifted video
    • true_offset_ms: Applied offset in milliseconds
    • video_length_sec: Duration of the original video
    • motion_level: Estimated motion energy (0-1)
    • audio_energy_level: Estimated audio RMS energy (0-1)

Helper Functions

_get_video_duration

_get_video_duration(path: str) -> float
Return video duration in seconds using ffprobe.
path
str
Path to the video file.
Returns: Duration in seconds.

_compute_motion_level

_compute_motion_level(path: str, max_frames: int = 300) -> float
Estimate motion level as the mean frame-to-frame pixel difference over the first max_frames frames.
path
str
Path to the video file.
max_frames
int
default:"300"
Maximum number of frames to analyze.
Returns: Motion level normalized to [0, 1].

_compute_audio_energy

_compute_audio_energy(path: str) -> float
Extract a short WAV snippet with ffmpeg and compute RMS energy.
path
str
Path to the video file.
Returns: Normalized energy value in [0, 1]. Typical speech RMS is ~0.05-0.15.

_apply_positive_offset

_apply_positive_offset(input_path: str, output_path: str, offset_sec: float)
Prepend offset_sec seconds of black video and silent audio, then concatenate with the original.
input_path
str
Path to the input video.
output_path
str
Path where the shifted video will be saved.
offset_sec
float
Duration of black frames to prepend, in seconds.

_apply_negative_offset

_apply_negative_offset(input_path: str, output_path: str, trim_sec: float)
Trim the first trim_sec seconds from the video.
input_path
str
Path to the input video.
output_path
str
Path where the trimmed video will be saved.
trim_sec
float
Duration to trim from the start, in seconds.

Usage Example

import logging
from evaluation.offset_generation import generate_synthetic_dataset

logging.basicConfig(level=logging.INFO)

# Generate synthetic dataset with custom offsets
metadata_csv = generate_synthetic_dataset(
    originals_dir="/path/to/originals",
    synthetic_dir="/path/to/synthetic",
    metadata_dir="/path/to/metadata",
    offsets_ms=[-2000, -1000, 0, 1000, 2000],
)

print(f"Metadata written to: {metadata_csv}")

CLI Usage

python -m evaluation.offset_generation
Runs with default settings:
  • Reads from evaluation/originals/
  • Writes shifted videos to evaluation/synthetic/
  • Writes metadata to evaluation/metadata/synthetic_metadata.csv
  • Uses default offsets: [-1000, -500, -100, 100, 500, 1000] ms

Configuration

Default Directories

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
ORIGINALS_DIR = os.path.join(BASE_DIR, "originals")
SYNTHETIC_DIR = os.path.join(BASE_DIR, "synthetic")
METADATA_DIR = os.path.join(BASE_DIR, "metadata")

Default Offsets

DEFAULT_OFFSETS_MS = [-1000, -500, -100, 100, 500, 1000]

Supported Video Extensions

VIDEO_EXTENSIONS = {".mp4", ".mov", ".avi"}

Notes

  • Video Quality: Output videos use libx264 codec with CRF 23 and ultrafast preset for speed.
  • Audio Handling: Automatically detects whether videos contain audio streams and handles accordingly.
  • Sensitivity Tags: Motion level and audio energy are computed to enable sensitivity analysis in later evaluation stages.
  • Zero Offset: When offset is 0, the video is re-encoded for consistency with other test cases.

Build docs developers (and LLMs) love