Skip to main content

Overview

The editor.py module uses Google Gemini 3.0 Flash (multimodal) to analyze video content and generate contextual FFmpeg filter strings. This enables dynamic visual effects like zooms, color adjustments, and pacing enhancements tailored to the video’s narrative.

VideoEditor Class

class VideoEditor:
    def __init__(self, api_key: str)
Main class for AI-driven video editing. Parameters:
  • api_key (str): Google Gemini API key
Attributes:
  • client: Gemini API client instance
  • model_name: "gemini-3-flash-preview" (supports video uploads)

Key Methods

upload_video

def upload_video(self, video_path: str) -> File
Uploads video to Gemini File API for analysis. Parameters:
  • video_path (str): Path to video file
Returns:
  • File: Gemini file object (ready for inference)
Process:
  1. Validates file exists
  2. Uploads via client.files.upload()
  3. Polls for processing status (checks every 2 seconds)
  4. Returns when state is "ACTIVE"

get_ffmpeg_filter

def get_ffmpeg_filter(
    self, 
    video_file_obj: File,
    duration: float,
    fps: int = 30,
    width: Optional[int] = None,
    height: Optional[int] = None,
    transcript: Optional[dict] = None
) -> Optional[dict]
Generates FFmpeg filter string using Gemini’s video analysis. Parameters:
  • video_file_obj (File): Uploaded video file from upload_video()
  • duration (float): Video duration in seconds
  • fps (int): Framerate (default: 30)
  • width (int): Video width (default: 1080)
  • height (int): Video height (default: 1920)
  • transcript (dict): Optional transcript for context-aware effects
Returns:
  • dict with key "filter_string" containing raw FFmpeg filter
  • None if parsing fails
Example Output:
{
  "filter_string": "zoompan=z='1.1*between(on,0,75)+1.3*between(on,76,150)':s=1080x1920:fps=30:d=1,eq=contrast=1.2:enable='between(t,0,3)',hue=s=0:enable='between(t,10,12)'"
}
Prompt Strategy: The method sends a detailed prompt instructing Gemini to:
  1. Analyze video content and transcript context
  2. Apply effects selectively (not randomly):
    • Punch-in zooms for key moments/jokes
    • Slow zooms during speech
    • Visual effects (contrast, saturation) for mood changes
  3. Use safe syntax:
    • Avoid comparison operators (<, >, <=, >=)
    • Use FFmpeg expression functions: between(), lt(), gte(), etc.
    • Use enable='between(t,start,end)' for timeline-based effects
    • Always set zoompan output size to exact input resolution
Supported Filters:
  • zoompan: Dynamic zoom/pan effects
  • eq: Brightness, contrast, saturation adjustments
  • hue: Color hue and saturation (black & white via s=0)
  • unsharp: Sharpening
  • curves: Advanced color grading

apply_edits

def apply_edits(self, input_path: str, output_path: str, filter_data: dict) -> None
Executes FFmpeg with the generated filter string. Parameters:
  • input_path (str): Input video path
  • output_path (str): Output video path
  • filter_data (dict): Output from get_ffmpeg_filter()
Process:
  1. Validates filter data
  2. Probes input dimensions using ffprobe
  3. Sanitizes filter string (converts comparisons to functions)
  4. Enforces zoompan output size to preserve aspect ratio
  5. Adds setsar=1 for square pixel aspect ratio
  6. Executes FFmpeg with -vf filter
FFmpeg Command:
ffmpeg -y -i input.mp4 \
  -vf "<generated_filter_string>" \
  -c:v libx264 -preset fast -crf 22 \
  -c:a copy \
  output.mp4

Helper Methods (Static)

_split_filter_chain

@staticmethod
def _split_filter_chain(filter_string: str) -> list[str]
Splits FFmpeg filter chain on commas while respecting single-quoted substrings. Parameters:
  • filter_string (str): Raw filter string
Returns:
  • list[str]: Individual filter components

_enforce_zoompan_output_size

@classmethod
def _enforce_zoompan_output_size(cls, filter_string: str, width: int, height: int) -> str
Forces any zoompan filter to output exact input dimensions. Parameters:
  • filter_string (str): Raw filter string
  • width (int): Target width
  • height (int): Target height
Returns:
  • str: Modified filter string with :s=WIDTHxHEIGHT

_sanitize_filter_string

@staticmethod
def _sanitize_filter_string(filter_string: str) -> str
Converts comparison operators to FFmpeg expression functions. Conversions:
  • t >= 3gte(t,3)
  • t <= 10lte(t,10)
  • on > 75gt(on,75)
  • on < 150lt(on,150)
Parameters:
  • filter_string (str): Gemini-generated filter
Returns:
  • str: Sanitized filter (compatible with all FFmpeg builds)

Example Usage

from editor import VideoEditor

# Initialize editor
editor = VideoEditor(api_key="your-gemini-key")

# Upload video for analysis
video_file = editor.upload_video("clip.mp4")

# Generate contextual effects
filter_data = editor.get_ffmpeg_filter(
    video_file_obj=video_file,
    duration=30.0,
    fps=30,
    width=1080,
    height=1920,
    transcript={"text": "This is amazing!", "segments": [...]}
)

# Apply effects
editor.apply_edits(
    input_path="clip.mp4",
    output_path="clip_edited.mp4",
    filter_data=filter_data
)

Filter Generation Constraints

  1. Resolution Preservation: Output must match input resolution exactly
  2. No Manual Editing: Filter strings are auto-generated (no user modification)
  3. Context-Aware: Effects align with speech rhythm and visual action
  4. Safe Syntax: Uses FFmpeg expression functions (not raw operators)
  5. Timeline Editing: Uses enable option instead of dynamic parameter expressions

Dependencies

  • google-genai: Gemini API client
  • ffmpeg: Video processing (subprocess)
  • ffprobe: Video metadata extraction
  • json: JSON parsing
  • re: Regex for filter sanitization

Build docs developers (and LLMs) love