This page walks through installingDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/HUANGCHIHHUNGLeo/claude-real-video/llms.txt
Use this file to discover all available pages before exploring further.
claude-real-video, analysing your first video (a YouTube URL and a local file), exploring the output folder, and handing the results to an LLM. Each step takes under a minute once ffmpeg is installed.
Install
Choose the tier that matches your needs. The core install gives you frame extraction and deduplication. Add the You also need
[whisper] extra for audio transcription.ffmpeg on your PATH. See Installation for platform-specific instructions.Analyse a YouTube video
Pass any URL that
yt-dlp supports — YouTube, Instagram Reels, TikTok, and hundreds of other sites:crv will:- Download the video via
yt-dlp - Extract frames at every scene change plus a density floor (at least one frame per second)
- Deduplicate near-identical frames using real pixel differences against a sliding window
- Transcribe the audio with Whisper (if installed), or use embedded subtitles if present
- Write
MANIFEST.txtsummarising everything for the model
Explore the output
The output directory (
crv-out by default) contains three items:| File | Description |
|---|---|
frames/frame_001.jpg … frame_NNN.jpg | The kept frames in chronological order, each scaled to 640 px wide. |
transcript.txt | Plain-text transcript — from embedded subtitles if available, otherwise from Whisper. |
MANIFEST.txt | A single summary file with source URL, duration, frame count, dedup stats, and the full transcript. This is what you hand to the LLM. |
Hand it to an LLM
Open Claude, ChatGPT, Gemini, or any vision-capable model and attach:
- All JPEG files from
crv-out/frames/ crv-out/MANIFEST.txt
Analyse a local file
Pass a file path instead of a URL. Use-o to choose a custom output directory and --lang to tell Whisper the spoken language (skips language-detection and improves accuracy):
ffmpeg can read is supported — .mp4, .mov, .mkv, .webm, .avi, and more.
Focus the analysis with —why
Tellcrv why you are watching the video. The intent is written into MANIFEST.txt as the first line, instructing the LLM to analyse with that lens rather than producing a generic summary:
MANIFEST.txt opens with:
Frames only (skip transcription)
Use--no-transcribe to skip the Whisper step entirely — useful when you only need visual frames or when the video has no audio:
frames/ and MANIFEST.txt but no transcript.txt. The manifest notes that transcription was skipped.
CLI Reference
Full list of flags, defaults, and usage examples for the
crv command.Python API
Use
claude_real_video.process() to integrate frame extraction and transcription directly into your Python code.