Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/HUANGCHIHHUNGLeo/claude-real-video/llms.txt

Use this file to discover all available pages before exploring further.

This page walks through installing claude-real-video, analysing your first video (a YouTube URL and a local file), exploring the output folder, and handing the results to an LLM. Each step takes under a minute once ffmpeg is installed.
1

Install

Choose the tier that matches your needs. The core install gives you frame extraction and deduplication. Add the [whisper] extra for audio transcription.
pip install claude-real-video
You also need ffmpeg on your PATH. See Installation for platform-specific instructions.
2

Analyse a YouTube video

Pass any URL that yt-dlp supports — YouTube, Instagram Reels, TikTok, and hundreds of other sites:
crv "https://www.youtube.com/watch?v=..."
crv will:
  1. Download the video via yt-dlp
  2. Extract frames at every scene change plus a density floor (at least one frame per second)
  3. Deduplicate near-identical frames using real pixel differences against a sliding window
  4. Transcribe the audio with Whisper (if installed), or use embedded subtitles if present
  5. Write MANIFEST.txt summarising everything for the model
Expected terminal output:
✓ Done → crv-out
  42 frames  (deduped from 187 extracted)  in crv-out/frames
  manifest:   crv-out/MANIFEST.txt
  transcript: crv-out/transcript.txt
3

Explore the output

The output directory (crv-out by default) contains three items:
FileDescription
frames/frame_001.jpgframe_NNN.jpgThe kept frames in chronological order, each scaled to 640 px wide.
transcript.txtPlain-text transcript — from embedded subtitles if available, otherwise from Whisper.
MANIFEST.txtA single summary file with source URL, duration, frame count, dedup stats, and the full transcript. This is what you hand to the LLM.
4

Hand it to an LLM

Open Claude, ChatGPT, Gemini, or any vision-capable model and attach:
  • All JPEG files from crv-out/frames/
  • crv-out/MANIFEST.txt
The manifest tells the model what the video is, how long it is, how many frames were extracted and deduplicated, and includes the full transcript — so the model can reason about both what it sees and what was said. Then ask your question as you normally would.

Analyse a local file

Pass a file path instead of a URL. Use -o to choose a custom output directory and --lang to tell Whisper the spoken language (skips language-detection and improves accuracy):
crv lecture.mp4 -o out --lang en
Any format ffmpeg can read is supported — .mp4, .mov, .mkv, .webm, .avi, and more.

Focus the analysis with —why

Tell crv why you are watching the video. The intent is written into MANIFEST.txt as the first line, instructing the LLM to analyse with that lens rather than producing a generic summary:
crv "https://youtu.be/..." --why "find the pricing strategy"
The resulting MANIFEST.txt opens with:
viewing intent: find the pricing strategy
(reader: analyse the frames and transcript with this intent as the lens — surface what serves it first, skip what doesn't)
When you hand the output folder to a model, it immediately knows what you care about.

Frames only (skip transcription)

Use --no-transcribe to skip the Whisper step entirely — useful when you only need visual frames or when the video has no audio:
crv clip.mp4 --no-transcribe
The output folder will contain frames/ and MANIFEST.txt but no transcript.txt. The manifest notes that transcription was skipped.

CLI Reference

Full list of flags, defaults, and usage examples for the crv command.

Python API

Use claude_real_video.process() to integrate frame extraction and transcription directly into your Python code.

Build docs developers (and LLMs) love