Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt

Use this file to discover all available pages before exploring further.

whisper_cpp uses the optional pywhispercpp package to run Whisper models through the whisper.cpp C++ runtime. It is useful when you want local CPU transcription with ggml model files and a smaller Python dependency surface than PyTorch-based engines. No CUDA installation is required.

Install

pip install "RealtimeSTT[whisper-cpp]"

Basic Usage

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="whisper_cpp",
    model="tiny.en",
    device="cpu",
)

Model Handling

model can be a name or path accepted by pywhispercpp. For known model names, pywhispercpp may download the matching ggml model automatically. Use download_root to keep model files in a predictable directory:
recorder = AudioToTextRecorder(
    transcription_engine="whisper_cpp",
    model="small.en-q5_1",
    download_root="models/whispercpp",
    device="cpu",
)
If you download model files manually, pass the model path directly or configure pywhispercpp’s models_dir option through transcription_engine_options:
recorder = AudioToTextRecorder(
    transcription_engine="whisper_cpp",
    model="tiny.en",
    transcription_engine_options={
        "model": {
            "models_dir": "/path/to/your/ggml/models",
        },
    },
)

CPU Tuning

For realtime CPU transcription, use greedy decoding and streaming-friendly pywhispercpp options. A complete realtime configuration with both final and realtime engines looks like this:
recorder = AudioToTextRecorder(
    transcription_engine="whisper_cpp",
    model="tiny.en",
    device="cpu",
    beam_size=5,
    transcription_engine_options={
        "model": {
            "n_threads": 8,
            "redirect_whispercpp_logs_to": None,
        },
    },
    enable_realtime_transcription=True,
    realtime_transcription_engine="whisper_cpp",
    realtime_model_type="tiny.en",
    beam_size_realtime=1,
    realtime_processing_pause=0.15,
    realtime_transcription_engine_options={
        "model": {
            "n_threads": 8,
            "redirect_whispercpp_logs_to": None,
        },
        "transcribe": {
            "single_segment": True,
            "no_context": True,
            "print_timestamps": False,
        },
    },
)
Good starting profiles:
ProfileModelFinal beam_sizeRealtime beam_size_realtime
Fasttiny.en or base.en-q5_111
Balancedsmall.en-q5_131
More accurate CPUsmall.en51
medium.en and larger models can be too slow for interactive CPU use.

Engine-Specific Options

Pass backend-specific configuration through transcription_engine_options:
Option bucketMeaning
transcription_engine_options["model"]Passed to pywhispercpp.model.Model.
transcription_engine_options["transcribe"]Merged into Model.transcribe(...).
download_rootPassed as models_dir.
beam_sizeUses whisper.cpp beam search when greater than 1; otherwise greedy decoding.
initial_promptString prompts become initial_prompt; token iterables become prompt token fields.

Current Adapter Limitations

  • compute_type, batch_size, faster_whisper_vad_filter, and suppress_tokens do not map to equivalent whisper.cpp behavior and are ignored.
  • Language probability is not reported like faster-whisper; explicit languages are returned with probability 1.0.
  • Native whisper.cpp output may still appear in the console depending on package behavior and options.

When to Use whisper.cpp

Good fit

  • CPU-only machines without a CUDA-capable GPU
  • Environments where you want to avoid PyTorch as a dependency
  • Low-memory setups using quantized ggml models (e.g. q5_1 variants)
  • Containers or servers where binary size matters

Consider faster-whisper instead

  • GPU inference with CTranslate2 quantization
  • Production use cases requiring language probability scores
  • Batched inference pipelines (batch_size > 0)
  • Richer option pass-through (VAD filter, suppress_tokens, etc.)

Troubleshooting

1

Import fails

Ensure pywhispercpp is installed in the active environment: pip install pywhispercpp.
2

Model cannot be found

Set download_root to a writable directory, or pass an absolute path to the ggml model file as model.
3

Realtime updates fall behind speech

Reduce model size, increase n_threads up to the CPU’s useful limit, keep beam_size_realtime=1, and increase realtime_processing_pause.

Build docs developers (and LLMs) love