Quickstart: Analyse Your First Video with claude-real-video

This page walks through installing claude-real-video, analysing your first video (a YouTube URL and a local file), exploring the output folder, and handing the results to an LLM. Each step takes under a minute once ffmpeg is installed.

Install

Choose the tier that matches your needs. The core install gives you frame extraction and deduplication. Add the [whisper] extra for audio transcription.

pip install claude-real-video

You also need ffmpeg on your PATH. See Installation for platform-specific instructions.

Analyse a YouTube video

Pass any URL that yt-dlp supports — YouTube, Instagram Reels, TikTok, and hundreds of other sites:

crv "https://www.youtube.com/watch?v=..."

crv will:

Download the video via yt-dlp
Extract frames at every scene change plus a density floor (at least one frame per second)
Deduplicate near-identical frames using real pixel differences against a sliding window
Transcribe the audio with Whisper (if installed), or use embedded subtitles if present
Write MANIFEST.txt summarising everything for the model

Expected terminal output:

✓ Done → crv-out
  42 frames  (deduped from 187 extracted)  in crv-out/frames
  manifest:   crv-out/MANIFEST.txt
  transcript: crv-out/transcript.txt

Explore the output

The output directory (crv-out by default) contains three items:

File	Description
`frames/frame_001.jpg` … `frame_NNN.jpg`	The kept frames in chronological order, each scaled to 640 px wide.
`transcript.txt`	Plain-text transcript — from embedded subtitles if available, otherwise from Whisper.
`MANIFEST.txt`	A single summary file with source URL, duration, frame count, dedup stats, and the full transcript. This is what you hand to the LLM.

Hand it to an LLM

Open Claude, ChatGPT, Gemini, or any vision-capable model and attach:

All JPEG files from crv-out/frames/
crv-out/MANIFEST.txt

The manifest tells the model what the video is, how long it is, how many frames were extracted and deduplicated, and includes the full transcript — so the model can reason about both what it sees and what was said. Then ask your question as you normally would.

Analyse a local file

Pass a file path instead of a URL. Use -o to choose a custom output directory and --lang to tell Whisper the spoken language (skips language-detection and improves accuracy):

crv lecture.mp4 -o out --lang en

Any format ffmpeg can read is supported — .mp4, .mov, .mkv, .webm, .avi, and more.

Focus the analysis with —why

Tell crv why you are watching the video. The intent is written into MANIFEST.txt as the first line, instructing the LLM to analyse with that lens rather than producing a generic summary:

crv "https://youtu.be/..." --why "find the pricing strategy"

The resulting MANIFEST.txt opens with:

viewing intent: find the pricing strategy
(reader: analyse the frames and transcript with this intent as the lens — surface what serves it first, skip what doesn't)

When you hand the output folder to a model, it immediately knows what you care about.

Frames only (skip transcription)

Use --no-transcribe to skip the Whisper step entirely — useful when you only need visual frames or when the video has no audio:

crv clip.mp4 --no-transcribe

The output folder will contain frames/ and MANIFEST.txt but no transcript.txt. The manifest notes that transcription was skipped.

CLI Reference

Full list of flags, defaults, and usage examples for the crv command.

Python API

Use claude_real_video.process() to integrate frame extraction and transcription directly into your Python code.

Get Started

Guides

Reference

Resources

Quickstart: Analyse Your First Video with claude-real-video

Analyse a local file

Focus the analysis with —why

Frames only (skip transcription)

CLI Reference

Python API

Build docs developers (and LLMs) love

Get Started

Guides

Reference

Resources

Documentation Index

​Analyse a local file

​Focus the analysis with —why

​Frames only (skip transcription)

CLI Reference

Python API

Build docs developers (and LLMs) love

Analyse a local file

Focus the analysis with —why

Frames only (skip transcription)