Skip to main content
The preprocess module handles audio extraction from video files using FFmpeg. It prepares audio tracks for synchronization analysis by converting them to a standard format.

extract_audio_from_videos

Extracts audio tracks from all video files in a directory using FFmpeg. Produces mono WAV files at a specified sample rate.
from src.preprocess import extract_audio_from_videos

extract_audio_from_videos(
    video_dir="./videos",
    audio_dir="./audio_output",
    target_sr=16000
)

Parameters

video_dir
str
required
Directory containing input video files (.mp4, .mov)
audio_dir
str
required
Output directory for extracted WAV files. Created if it doesn’t exist.
target_sr
int
default:"16000"
Target sample rate in Hz for the output audio files. Common values:
  • 16000 - Default, good balance of quality and processing speed
  • 44100 - CD quality
  • 48000 - Professional video standard

Returns

No return value. Audio files are written to disk with the same base filename as the video:
  • video1.mp4video1.wav
  • camera_A.movcamera_A.wav

Raises

RuntimeError
exception
Raised if FFmpeg is not found in system PATH
CalledProcessError
exception
Raised if FFmpeg fails to extract audio from a video file

Output Format

All extracted audio files are:
  • Format: WAV (uncompressed)
  • Channels: Mono (1 channel) - stereo tracks are mixed to mono
  • Sample rate: As specified by target_sr parameter
  • Bit depth: 16-bit PCM
  • Naming: Same basename as input video file with .wav extension

Implementation Details

The function:
  1. Scans video_dir for video files with extensions .mp4 or .mov
  2. For each video, runs FFmpeg with the following conversion:
    • -ac 1 - Convert to mono
    • -ar {target_sr} - Resample to target sample rate
    • -vn - Discard video stream (audio only)
  3. Writes mono WAV files to audio_dir

Usage Example

import os
from src.preprocess import extract_audio_from_videos
from src import config

# Extract audio for synchronization
video_dir = config.VIDEO_DIR
audio_dir = config.AUDIO_DIR

extract_audio_from_videos(
    video_dir=video_dir,
    audio_dir=audio_dir,
    target_sr=16000
)

# Extracted files are now ready for sync analysis
print(f"Audio files extracted to: {audio_dir}")
print(f"Files: {os.listdir(audio_dir)}")

Integration with Sync Workflow

This function is typically called before audio-based synchronization:
# Complete audio sync workflow
from src.preprocess import extract_audio_from_videos
from src.audio_sync import estimate_offsets_robust

# Step 1: Extract audio
extract_audio_from_videos(
    video_dir="./raw_videos",
    audio_dir="./extracted_audio"
)

# Step 2: Compute offsets using GCC-PHAT
offsets = estimate_offsets_robust(
    audio_dir="./extracted_audio",
    max_offset_sec=10.0
)

print("Computed offsets:", offsets)
The Flask UI automatically handles audio extraction when audio sync method is selected. This function is primarily useful for programmatic/batch processing workflows.

Requirements

FFmpeg must be installed and available in your system PATH:
  • macOS: brew install ffmpeg
  • Windows: Download from ffmpeg.org and add to PATH
  • Linux: sudo apt install ffmpeg
Verify installation: ffmpeg -version

Performance

Audio extraction performance depends on:
  • Video duration and codec
  • Target sample rate (lower rates process faster)
  • Disk I/O speed
Typical performance for 1080p videos:
  • 5-minute video: ~5-10 seconds extraction time
  • 30-minute video: ~20-30 seconds extraction time
Processing runs sequentially (one video at a time), but each video is independent.

See Also

audio_sync

Use extracted audio for GCC-PHAT synchronization

Configuration

Configure audio and video directory paths

Build docs developers (and LLMs) love