RealtimeSTT routes all speech recognition through a lazy-loaded engine factory.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
AudioToTextRecorder selects the main final-transcription backend with the transcription_engine parameter. A separate backend can optionally handle realtime transcription through realtime_transcription_engine. When realtime_transcription_engine is None, realtime transcription uses the same backend as the final transcription engine. The default backend is faster_whisper, which covers most local GPU and CPU Whisper use cases out of the box.
Engine modules are imported only when the selected engine is first instantiated, so unused optional dependencies are never loaded.
Choosing an Engine
| Use case | Start with | Why |
|---|---|---|
| Default local GPU/CPU Whisper path | faster_whisper | Install with RealtimeSTT[faster-whisper]; mature, supports common Whisper model names and CTranslate2 models. |
| CPU-only experiments with small Whisper models | whisper_cpp | Uses whisper.cpp through pywhispercpp; good for low-dependency CPU testing. |
| Compatibility with OpenAI’s local Whisper package | openai_whisper | Uses the original openai-whisper Python package. |
| English CPU server with manually downloaded ONNX models | sherpa_onnx_moonshine | Offline CPU INT8 path with predictable local model files. |
| CPU Parakeet without NeMo runtime | sherpa_onnx_parakeet | Offline CPU INT8 Parakeet through sherpa-onnx. |
Kroko/Banafo .data streaming models | kroko_onnx | Optional Kroko-ONNX runtime with Community or licensed Pro models and realtime streaming previews. |
| NVIDIA Parakeet on Linux/WSL2 | parakeet | Uses NVIDIA NeMo ASR for the Parakeet checkpoint. |
| Meta Omnilingual ASR on Linux/WSL2 Python 3.11.x | omnilingual_asr | Uses Meta’s Omnilingual ASR package; native Windows and Python 3.12.x are not practical install targets for the current upstream dependency stack. |
| Hugging Face speech-language models | granite_speech, qwen3_asr, moonshine, cohere_transcribe | Thin adapters around model-family packages and Transformers. |
Supported Engine Names
Engine names are normalized by replacing- with _ before lookup, so both Python-style and CLI-style names work interchangeably. Unsupported names raise an error that lists all available engines.
Both
faster-whisper and faster_whisper resolve to the same engine. The normalization is applied automatically, so you may use whichever style you prefer.| Engine name(s) | Status | Reference |
|---|---|---|
faster_whisper | Default production backend | faster-whisper |
whisper_cpp | Optional production backend | whisper.cpp |
openai_whisper | Optional production backend | OpenAI Whisper |
moonshine, moonshine_streaming | Experimental Transformers backend; English-only adapter | — |
sherpa_onnx_moonshine, sherpa_moonshine, moonshine_sherpa_onnx | CPU INT8 sherpa-onnx backend | sherpa-onnx |
kroko_onnx, kroko, banafo_kroko | Optional Kroko-ONNX backend | — |
parakeet, nvidia_parakeet | Experimental NVIDIA NeMo backend | — |
sherpa_onnx_parakeet, sherpa_parakeet, parakeet_sherpa_onnx | CPU INT8 sherpa-onnx backend | sherpa-onnx |
omnilingual_asr, omnilingual, meta_omnilingual_asr, omni_asr | Experimental Meta Omnilingual ASR backend for Linux/WSL2 Python 3.11.x | — |
granite_speech, granite | Experimental Transformers backend | — |
qwen3_asr, qwen_asr | Experimental Qwen ASR backend | — |
cohere_transcribe, cohere | Experimental Transformers backend, requires language | — |
funasr | Experimental FunASR backend | — |
openai_api | Placeholder, not wired yet | Not available |
Selecting a Backend
Pass the engine name as thetranscription_engine argument. The default is faster_whisper when the argument is omitted.
Different Engines for Final and Realtime
You can assign a lightweight engine to realtime transcription and a higher-quality engine to the final pass. This keeps the realtime display responsive while preserving accuracy on the committed result.realtime_transcription_engine is None, realtime transcription uses the same backend as transcription_engine.
Engine-Specific Options
Usetranscription_engine_options and realtime_transcription_engine_options to pass backend-specific dictionaries. These are intentionally per-engine — a key that is meaningful for one backend may be ignored or invalid for another.
Model Download Behavior
| Engine family | Automatic download | Manual placement |
|---|---|---|
faster_whisper | Yes, for known Hugging Face/CTranslate2 model ids. | Local CTranslate2 model directories may be passed as model. |
whisper_cpp | Usually yes for model names supported by pywhispercpp. | Local ggml model paths or download_root/models_dir may be used. |
openai_whisper | Yes, through openai-whisper. | Local model names/paths supported by that package. |
moonshine, granite_speech, qwen3_asr, cohere_transcribe | Yes, through Hugging Face or the engine package, subject to access. | download_root maps to cache options where supported. |
parakeet NeMo | Yes, through NeMo model loading. | NeMo cache/model options may be passed in transcription_engine_options. |
omnilingual_asr | Yes, through Omnilingual/fairseq2/Hugging Face cache paths in Linux or WSL2 with Python 3.11.x. | Pass an Omnilingual model card such as omniASR_CTC_1B_v2. |
sherpa_onnx_* | No. | Download and extract the sherpa-onnx model bundle, then pass the extracted directory. |
kroko_onnx | Yes, for known public Community .data files when enabled. | Pro/private models need an existing .data path, direct URL, or explicit repo/token options. |
Extending Engines
RealtimeSTT’s engine system is designed to be extended. New engines should:- Implement
BaseTranscriptionEnginefrom.base. - Return a
TranscriptionResult(with aTranscriptionInfofield) from thetranscribemethod. - Be registered in
RealtimeSTT/transcription_engines/factory.pyunder a normalized name string.
importlib.import_module to load engine classes on demand:
