Documentation Index
Fetch the complete documentation index at: https://mintlify.com/HUANGCHIHHUNGLeo/claude-real-video/llms.txt
Use this file to discover all available pages before exploring further.
claude-real-video gives any LLM — Claude, ChatGPT, Gemini, or any other — genuine video comprehension. Instead of sampling frames at a fixed interval (which over-samples static screencasts and under-samples fast edits), it detects every scene change, collapses near-duplicate shots with a sliding-window deduplicator, and produces a clean folder of key frames plus a transcript that any model can read.
Installation
Install via pip with optional Whisper transcription support
Quickstart
Run your first video analysis in under two minutes
CLI Reference
Every flag and option for the
crv commandPython API
Call
process() directly from your own scriptsWhy claude-real-video?
Most “let an LLM watch a video” approaches grab frames at a fixed rate — one per second — and ignore the audio. That means a ten-minute screencast with no cuts sends hundreds of nearly-identical frames, while a fast-cut trailer misses visual changes between samples.claude-real-video is different:
Scene-change detection
Frames are selected at every scene cut plus a configurable density floor — not a fixed quota.
Sliding-window dedup
A-B-A cuts and repeated shots are collapsed so the model only sees each unique shot once.
Smart transcription
Uses existing subtitles (.srt/.vtt or embedded) first; falls back to Whisper only when needed.
Fully local
Runs entirely on your machine. No video is uploaded to any third-party cloud service.
How it works
Fetch the video
Point
crv at a YouTube, Instagram, or TikTok URL (via yt-dlp) or a local file path. The video stays on your machine.Extract meaningful frames
ffmpeg runs a single chronological pass that captures every scene change plus a density floor — so you get the right frames whether the video is a static slideshow or a rapid-fire reel.Deduplicate with a sliding window
Each candidate frame is compared against the last N kept frames using real pixel difference. Repeated shots — even after a cutaway — are dropped automatically.
Transcribe the audio
If the video already has subtitles (sidecar
.srt/.vtt or an embedded stream), those are used verbatim. Otherwise Whisper transcribes the audio track.