OpenShorts uses a sophisticated 10-step pipeline to transform long-form videos into viral vertical clips. The entire workflow is powered by AI and runs automatically once you submit a video.
After detection, each scene is analyzed to determine optimal framing:
# main.py:375-421def analyze_scenes_strategy(video_path, scenes): # Samples 3 frames per scene (start, middle, end) # Counts faces using MediaPipe # Returns 'TRACK' or 'GENERAL' strategy for each scene if avg_faces > 1.2 or avg_faces < 0.5: strategies.append('GENERAL') # Multiple people or no faces else: strategies.append('TRACK') # Single subject tracking
Performance Tip: Scene detection runs at downscaled resolution for speed. The ContentDetector uses default sensitivity which works well for most content.
Gemini analyzes the transcript using a strict prompt contract:
# main.py:31-68 - Gemini Prompt TemplateGEMINI_PROMPT_TEMPLATE = """You are a senior short-form video editor. Read the ENTIRE transcript and word-level timestamps to choose the 3–15 MOST VIRAL moments for TikTok/IG Reels/YouTube Shorts. Each clip must be between 15 and 60 seconds long.⚠️ FFMPEG TIME CONTRACT — STRICT REQUIREMENTS:- Return timestamps in ABSOLUTE SECONDS from the start of the video- Only NUMBERS with decimal point, up to 3 decimals (examples: 0, 1.250, 17.350)- Ensure 0 ≤ start < end ≤ VIDEO_DURATION_SECONDS- Each clip between 15 and 60 s (inclusive)- Prefer starting 0.2–0.4 s BEFORE the hook and ending 0.2–0.4 s AFTER the payoff- Use silence moments for natural cuts; never cut mid-word"""
Important: The AI analysis requires the GEMINI_API_KEY environment variable. The system will fail gracefully if missing and convert the entire video instead.
# app.py:176-188def enqueue_output(out, job_id): for line in iter(out.readline, b''): decoded_line = line.decode('utf-8').strip() if decoded_line: print(f"📝 [Job Output] {decoded_line}") if job_id in jobs: jobs[job_id]['logs'].append(decoded_line)