Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt

Use this file to discover all available pages before exploring further.

RealtimeSTT routes all speech recognition through a lazy-loaded engine factory. AudioToTextRecorder selects the main final-transcription backend with the transcription_engine parameter. A separate backend can optionally handle realtime transcription through realtime_transcription_engine. When realtime_transcription_engine is None, realtime transcription uses the same backend as the final transcription engine. The default backend is faster_whisper, which covers most local GPU and CPU Whisper use cases out of the box. Engine modules are imported only when the selected engine is first instantiated, so unused optional dependencies are never loaded.

Choosing an Engine

Use caseStart withWhy
Default local GPU/CPU Whisper pathfaster_whisperInstall with RealtimeSTT[faster-whisper]; mature, supports common Whisper model names and CTranslate2 models.
CPU-only experiments with small Whisper modelswhisper_cppUses whisper.cpp through pywhispercpp; good for low-dependency CPU testing.
Compatibility with OpenAI’s local Whisper packageopenai_whisperUses the original openai-whisper Python package.
English CPU server with manually downloaded ONNX modelssherpa_onnx_moonshineOffline CPU INT8 path with predictable local model files.
CPU Parakeet without NeMo runtimesherpa_onnx_parakeetOffline CPU INT8 Parakeet through sherpa-onnx.
Kroko/Banafo .data streaming modelskroko_onnxOptional Kroko-ONNX runtime with Community or licensed Pro models and realtime streaming previews.
NVIDIA Parakeet on Linux/WSL2parakeetUses NVIDIA NeMo ASR for the Parakeet checkpoint.
Meta Omnilingual ASR on Linux/WSL2 Python 3.11.xomnilingual_asrUses Meta’s Omnilingual ASR package; native Windows and Python 3.12.x are not practical install targets for the current upstream dependency stack.
Hugging Face speech-language modelsgranite_speech, qwen3_asr, moonshine, cohere_transcribeThin adapters around model-family packages and Transformers.

Supported Engine Names

Engine names are normalized by replacing - with _ before lookup, so both Python-style and CLI-style names work interchangeably. Unsupported names raise an error that lists all available engines.
Both faster-whisper and faster_whisper resolve to the same engine. The normalization is applied automatically, so you may use whichever style you prefer.
Engine name(s)StatusReference
faster_whisperDefault production backendfaster-whisper
whisper_cppOptional production backendwhisper.cpp
openai_whisperOptional production backendOpenAI Whisper
moonshine, moonshine_streamingExperimental Transformers backend; English-only adapter
sherpa_onnx_moonshine, sherpa_moonshine, moonshine_sherpa_onnxCPU INT8 sherpa-onnx backendsherpa-onnx
kroko_onnx, kroko, banafo_krokoOptional Kroko-ONNX backend
parakeet, nvidia_parakeetExperimental NVIDIA NeMo backend
sherpa_onnx_parakeet, sherpa_parakeet, parakeet_sherpa_onnxCPU INT8 sherpa-onnx backendsherpa-onnx
omnilingual_asr, omnilingual, meta_omnilingual_asr, omni_asrExperimental Meta Omnilingual ASR backend for Linux/WSL2 Python 3.11.x
granite_speech, graniteExperimental Transformers backend
qwen3_asr, qwen_asrExperimental Qwen ASR backend
cohere_transcribe, cohereExperimental Transformers backend, requires language
funasrExperimental FunASR backend
openai_apiPlaceholder, not wired yetNot available

Selecting a Backend

Pass the engine name as the transcription_engine argument. The default is faster_whisper when the argument is omitted.
from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    model="small.en",
    transcription_engine="faster_whisper",
)

Different Engines for Final and Realtime

You can assign a lightweight engine to realtime transcription and a higher-quality engine to the final pass. This keeps the realtime display responsive while preserving accuracy on the committed result.
from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="faster_whisper",
    model="small.en",
    enable_realtime_transcription=True,
    realtime_transcription_engine="whisper_cpp",
    realtime_model_type="tiny.en",
    realtime_transcription_engine_options={
        "model": {"n_threads": 8},
        "transcribe": {"single_segment": True, "no_context": True},
    },
)
If realtime_transcription_engine is None, realtime transcription uses the same backend as transcription_engine.

Engine-Specific Options

Use transcription_engine_options and realtime_transcription_engine_options to pass backend-specific dictionaries. These are intentionally per-engine — a key that is meaningful for one backend may be ignored or invalid for another.
recorder = AudioToTextRecorder(
    transcription_engine="sherpa_onnx_moonshine",
    model="models/sherpa-onnx-moonshine-tiny-en-int8",
    device="cpu",
    language="en",
    transcription_engine_options={
        "num_threads": 2,
        "provider": "cpu",
    },
)

Model Download Behavior

Engine familyAutomatic downloadManual placement
faster_whisperYes, for known Hugging Face/CTranslate2 model ids.Local CTranslate2 model directories may be passed as model.
whisper_cppUsually yes for model names supported by pywhispercpp.Local ggml model paths or download_root/models_dir may be used.
openai_whisperYes, through openai-whisper.Local model names/paths supported by that package.
moonshine, granite_speech, qwen3_asr, cohere_transcribeYes, through Hugging Face or the engine package, subject to access.download_root maps to cache options where supported.
parakeet NeMoYes, through NeMo model loading.NeMo cache/model options may be passed in transcription_engine_options.
omnilingual_asrYes, through Omnilingual/fairseq2/Hugging Face cache paths in Linux or WSL2 with Python 3.11.x.Pass an Omnilingual model card such as omniASR_CTC_1B_v2.
sherpa_onnx_*No.Download and extract the sherpa-onnx model bundle, then pass the extracted directory.
kroko_onnxYes, for known public Community .data files when enabled.Pro/private models need an existing .data path, direct URL, or explicit repo/token options.

Extending Engines

RealtimeSTT’s engine system is designed to be extended. New engines should:
  1. Implement BaseTranscriptionEngine from .base.
  2. Return a TranscriptionResult (with a TranscriptionInfo field) from the transcribe method.
  3. Be registered in RealtimeSTT/transcription_engines/factory.py under a normalized name string.
Imports should remain lazy inside the engine module so that optional dependencies are only imported when that engine is actually selected. The factory uses importlib.import_module to load engine classes on demand:
# factory.py excerpt — how engines are registered
ENGINE_CLASS_PATHS = {
    "faster_whisper": (".faster_whisper_engine", "FasterWhisperEngine"),
    "whisper_cpp":    (".whisper_cpp_engine",    "WhisperCppEngine"),
    # ... other engines
}
Contract tests should cover missing dependency messages, parameter mapping, audio normalization, result conversion, and factory selection. Tests that load real models should remain opt-in.

Build docs developers (and LLMs) love