Overview
Themain.py module is the heart of OpenShorts’ video processing pipeline. It handles:
- YouTube video downloads
- Audio transcription with word-level timestamps
- Scene detection and strategic framing
- AI-powered viral clip identification
- Vertical video reframing with subject tracking
Key Functions
process_video_to_vertical
input_video(str): Path to input video filefinal_output_video(str): Path for output vertical video
bool: True if successful, False otherwise
- Detects scenes using PySceneDetect
- Analyzes each scene (single subject vs. group)
- Applies appropriate framing strategy:
- TRACK mode: MediaPipe face detection + YOLOv8 fallback with stabilization
- GENERAL mode: Blurred background layout for groups/landscapes
- Merges processed video with original audio
transcribe_video
video_path(str): Path to video file
dictwith keys:text(str): Full transcript textsegments(list): Segment objects with word timestampslanguage(str): Detected language code
get_viral_clips
transcript_result(dict): Output fromtranscribe_video()video_duration(float): Total video duration in seconds
dictcontaining:shorts(list): Array of clip objectscost_analysis(dict): Token usage and cost breakdown
- Clips are 15-60 seconds each
- Returns 3-15 clips ranked by predicted performance
- Uses ABSOLUTE timestamps (not relative)
detect_scenes
video_path(str): Path to video file
tupleof:scene_list(list): List of (start_time, end_time) tuplesfps(float): Video framerate
download_youtube_video
url(str): YouTube video URLoutput_dir(str): Directory to save video (default: current directory)
tupleof:downloaded_file(str): Path to downloaded MP4 filesanitized_title(str): Sanitized video title
- Automatically selects H.264 codec for compatibility
- Uses multiple player clients (tv_embed, android) to bypass restrictions
- Supports cookie authentication via
YOUTUBE_COOKIESenv var - Sanitizes filenames (removes special characters, limits length to 100 chars)
Core Classes
SmoothedCameraman
- Safe zone logic: only moves when subject leaves center zone (25% of crop width)
- Adaptive speed: slow pan (3px/frame) or fast reframe (15px/frame)
- Prevents oscillation with overshoot detection
update_target
[x, y, w, h].
get_crop_box
(x1, y1, x2, y2) for the frame.
Parameters:
force_snap(bool): If True, immediately snap to target (used on scene changes)
SpeakerTracker
- Face ID assignment using proximity matching
- Score decay system (0.85x per frame)
- Hysteresis: 3x score bonus for current active speaker
- Cooldown period (30 frames) before switching to new speaker
get_target
face_candidates(list): List of dicts with{"box": [x,y,w,h], "score": float}frame_number(int): Current frame indexwidth(int): Video width for distance calculations
list: Bounding box[x, y, w, h]of target speaker, orNone
Constants
ASPECT_RATIO
GEMINI_PROMPT_TEMPLATE
Comprehensive prompt template for viral clip detection. Includes:- FFmpeg timestamp format requirements
- Clip duration constraints (15-60s)
- Natural cut point guidelines
- Output JSON schema with social media metadata
Dependencies
faster-whisper: CPU-optimized transcription (base model, int8 quantization)google-genai: Gemini API clientultralytics: YOLOv8 person detectionmediapipe: Face detection (BlazeFace)opencv-python(cv2): Video frame processingscenedetect: Scene boundary detectionyt-dlp: YouTube video downloadsffmpeg: Video encoding/merging (subprocess)