The Python API exposes the same pipeline as the CLI as importable functions and a typed dataclass. Use it when you want to integrate video analysis into your own scripts, notebooks, or applications — for example, to batch-process multiple videos, post-process theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/HUANGCHIHHUNGLeo/claude-real-video/llms.txt
Use this file to discover all available pages before exploring further.
Result object programmatically, or wire the output directly into an LLM call without touching the command line. For one-off, interactive use the CLI is usually more convenient.
Installation
Install the package and then import the two public names:process and Result are the only names listed in __all__; everything else is internal.
process()
process() is the single entrypoint that runs the full pipeline: download or copy the video, extract scene-aware frames, deduplicate, transcribe, and write MANIFEST.txt.
Parameters
URL of a video (any site supported by yt-dlp, e.g. YouTube, Instagram, TikTok) or a local file path such as
"lecture.mp4" or "/home/user/clips/talk.mov".Output directory. Created automatically (including any missing parents) if it does not already exist. Re-running overwrites existing contents.
Scene-change sensitivity on a 0–1 scale. Lower values capture more frames on each visual cut; higher values require a more dramatic change. Passed directly to ffmpeg’s
select filter.Minimum frame density expressed in seconds. At least one frame is guaranteed every
fps_floor seconds regardless of scene changes, preventing long static segments from producing no frames at all.Hard cap on the total frame count after deduplication. Survivors are thinned uniformly so coverage remains spread across the full video timeline.
Whisper language code for audio transcription. Use
"auto" to let Whisper detect the language, or pass a specific code such as "en", "zh", or "fr". Ignored when do_transcribe=False or when existing subtitles are used.Absolute or relative path to a Netscape-format cookie file. Pass this for login-gated video sources that require authentication. Use only your own authorised credentials.
Whether to attempt audio transcription. When
False, no transcript.txt is written. Maps to the inverse of the CLI’s --no-transcribe flag.Percentage of pixels (in a downscaled RGB thumbnail) that must differ from all frames in the dedup window for a frame to be retained. Higher values keep fewer frames; lower values preserve more visual variation.
Number of recently kept frames to compare each candidate against. Using a window larger than 1 prevents A-B-A editing patterns from re-introducing shots the model has already seen after a cutaway. Set to
1 for classic consecutive-only deduplication.When
True, the full original soundtrack (music + speech + effects) is saved as audio.m4a in the output directory and its path is returned in Result.audio_path. Losslessly stream-copied when the source codec allows; otherwise re-encoded to AAC 192 kbps.When
True, writes a self-contained report.html deduplication visualiser and moves dropped frames into a dropped/ subdirectory. The report path is returned in Result.report_path.A short description of your viewing intent that is written into
MANIFEST.txt. When provided, the manifest instructs the LLM to focus its analysis through this lens rather than producing a generic summary.Returns
AResult dataclass instance.
Raises
RuntimeError— ifyt-dlpis not installed for a URL source, or if the download fails (e.g. private video).FileNotFoundError— ifsrcis a local path that does not exist.
Result
Result is a dataclasses.dataclass returned by process(). All paths are absolute strings.
Absolute path to the output directory that was created (or reused) for this run.
Absolute path to the downloaded or copied source video file (always
source.mp4 inside out_dir).Video duration in whole seconds, as reported by
ffprobe. Returns 0 if the duration cannot be determined.Absolute path to the
frames/ subdirectory containing the kept JPEG frames.Number of frames remaining after deduplication and the
max_frames cap — the frames the LLM should consume.Number of raw frames extracted by ffmpeg before deduplication. Always
>= frame_count.Absolute path to
transcript.txt, or None if transcription was skipped, failed, or the video has no audio. The file contains plain UTF-8 text with timecodes stripped.Absolute path to
MANIFEST.txt, the primary summary file to hand to an LLM.Human-readable explanation of the transcript status. Examples:
"(skipped: --no-transcribe)", "/abs/path/transcript.txt (transcribed by whisper)", "(none — this video has no subtitles and no audio track)".Absolute path to
audio.m4a if keep_audio=True and the video has an audio stream, otherwise None.Absolute path to
report.html if report=True, otherwise None.Usage example
Pipeline helpers
The following functions are defined inclaude_real_video.core and form the individual stages of the pipeline. They are not part of the public __all__ export.
These are internal helpers exposed for advanced use cases. Most users should call
process() directly — it orchestrates all of the stages below in the correct order.| Function | Description |
|---|---|
fetch_video(src, out_dir, cookies) | Downloads a URL via yt-dlp or copies a local file to source.mp4 in the output directory. |
extract_frames(video, frames_dir, scene, fps_floor) | Runs a single chronological ffmpeg pass to capture every scene change plus one frame per fps_floor seconds; returns the raw extracted frame count. |
dedup_frames(frames_dir, threshold, window, max_frames, dropped_dir) | Drops near-duplicate frames using real RGB pixel-diff against a sliding window; applies the max_frames cap; returns (kept_count, per_frame_records). |
transcribe(video, out_dir, lang) | Extracts audio to a temporary WAV and runs the Whisper CLI; returns the path to transcript.txt or None. |
existing_subtitles(src, video, out_dir) | Checks for a sidecar .srt/.vtt next to a local file, then an embedded subtitle stream; returns a plain-text transcript path or None. |
extract_full_audio(video, out_dir) | Saves the full original soundtrack as audio.m4a; stream-copies losslessly when possible, otherwise re-encodes to AAC 192 kbps. |
save_to_kb(kb_dir, manifest_path, src) | Copies the manifest into a knowledge-base directory as a dated Markdown note (YYYY-MM-DD-<slug>.md). |
write_report(out_dir, records, threshold, window) | Writes a self-contained report.html showing every extracted frame with its pixel-diff percentage and keep/drop/cap status. |