Documentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
whisper_cpp uses the optional pywhispercpp package to run Whisper models through the whisper.cpp C++ runtime. It is useful when you want local CPU transcription with ggml model files and a smaller Python dependency surface than PyTorch-based engines. No CUDA installation is required.
Install
Basic Usage
Model Handling
model can be a name or path accepted by pywhispercpp. For known model names, pywhispercpp may download the matching ggml model automatically. Use download_root to keep model files in a predictable directory:
pywhispercpp’s models_dir option through transcription_engine_options:
CPU Tuning
For realtime CPU transcription, use greedy decoding and streaming-friendlypywhispercpp options. A complete realtime configuration with both final and realtime engines looks like this:
| Profile | Model | Final beam_size | Realtime beam_size_realtime |
|---|---|---|---|
| Fast | tiny.en or base.en-q5_1 | 1 | 1 |
| Balanced | small.en-q5_1 | 3 | 1 |
| More accurate CPU | small.en | 5 | 1 |
medium.en and larger models can be too slow for interactive CPU use.Engine-Specific Options
Pass backend-specific configuration throughtranscription_engine_options:
| Option bucket | Meaning |
|---|---|
transcription_engine_options["model"] | Passed to pywhispercpp.model.Model. |
transcription_engine_options["transcribe"] | Merged into Model.transcribe(...). |
download_root | Passed as models_dir. |
beam_size | Uses whisper.cpp beam search when greater than 1; otherwise greedy decoding. |
initial_prompt | String prompts become initial_prompt; token iterables become prompt token fields. |
Current Adapter Limitations
compute_type,batch_size,faster_whisper_vad_filter, andsuppress_tokensdo not map to equivalent whisper.cpp behavior and are ignored.- Language probability is not reported like faster-whisper; explicit languages are returned with probability
1.0. - Native whisper.cpp output may still appear in the console depending on package behavior and options.
When to Use whisper.cpp
Good fit
- CPU-only machines without a CUDA-capable GPU
- Environments where you want to avoid PyTorch as a dependency
- Low-memory setups using quantized ggml models (e.g.
q5_1variants) - Containers or servers where binary size matters
Consider faster-whisper instead
- GPU inference with CTranslate2 quantization
- Production use cases requiring language probability scores
- Batched inference pipelines (
batch_size > 0) - Richer option pass-through (VAD filter,
suppress_tokens, etc.)
Troubleshooting
Model cannot be found
Set
download_root to a writable directory, or pass an absolute path to the ggml model file as model.