RealtimeSTT Transcription Engines: Comparison and Selection

RealtimeSTT routes all speech recognition through a lazy-loaded engine factory. AudioToTextRecorder selects the main final-transcription backend with the transcription_engine parameter. A separate backend can optionally handle realtime transcription through realtime_transcription_engine. When realtime_transcription_engine is None, realtime transcription uses the same backend as the final transcription engine. The default backend is faster_whisper, which covers most local GPU and CPU Whisper use cases out of the box. Engine modules are imported only when the selected engine is first instantiated, so unused optional dependencies are never loaded.

Choosing an Engine

Use case	Start with	Why
Default local GPU/CPU Whisper path	`faster_whisper`	Install with `RealtimeSTT[faster-whisper]`; mature, supports common Whisper model names and CTranslate2 models.
CPU-only experiments with small Whisper models	`whisper_cpp`	Uses whisper.cpp through `pywhispercpp`; good for low-dependency CPU testing.
Compatibility with OpenAI’s local Whisper package	`openai_whisper`	Uses the original `openai-whisper` Python package.
English CPU server with manually downloaded ONNX models	`sherpa_onnx_moonshine`	Offline CPU INT8 path with predictable local model files.
CPU Parakeet without NeMo runtime	`sherpa_onnx_parakeet`	Offline CPU INT8 Parakeet through sherpa-onnx.
Kroko/Banafo `.data` streaming models	`kroko_onnx`	Optional Kroko-ONNX runtime with Community or licensed Pro models and realtime streaming previews.
NVIDIA Parakeet on Linux/WSL2	`parakeet`	Uses NVIDIA NeMo ASR for the Parakeet checkpoint.
Meta Omnilingual ASR on Linux/WSL2 Python 3.11.x	`omnilingual_asr`	Uses Meta’s Omnilingual ASR package; native Windows and Python 3.12.x are not practical install targets for the current upstream dependency stack.
Hugging Face speech-language models	`granite_speech`, `qwen3_asr`, `moonshine`, `cohere_transcribe`	Thin adapters around model-family packages and Transformers.

Supported Engine Names

Engine names are normalized by replacing - with _ before lookup, so both Python-style and CLI-style names work interchangeably. Unsupported names raise an error that lists all available engines.

Both faster-whisper and faster_whisper resolve to the same engine. The normalization is applied automatically, so you may use whichever style you prefer.

Engine name(s)	Status	Reference
`faster_whisper`	Default production backend	faster-whisper
`whisper_cpp`	Optional production backend	whisper.cpp
`openai_whisper`	Optional production backend	OpenAI Whisper
`moonshine`, `moonshine_streaming`	Experimental Transformers backend; English-only adapter	—
`sherpa_onnx_moonshine`, `sherpa_moonshine`, `moonshine_sherpa_onnx`	CPU INT8 sherpa-onnx backend	sherpa-onnx
`kroko_onnx`, `kroko`, `banafo_kroko`	Optional Kroko-ONNX backend	—
`parakeet`, `nvidia_parakeet`	Experimental NVIDIA NeMo backend	—
`sherpa_onnx_parakeet`, `sherpa_parakeet`, `parakeet_sherpa_onnx`	CPU INT8 sherpa-onnx backend	sherpa-onnx
`omnilingual_asr`, `omnilingual`, `meta_omnilingual_asr`, `omni_asr`	Experimental Meta Omnilingual ASR backend for Linux/WSL2 Python 3.11.x	—
`granite_speech`, `granite`	Experimental Transformers backend	—
`qwen3_asr`, `qwen_asr`	Experimental Qwen ASR backend	—
`cohere_transcribe`, `cohere`	Experimental Transformers backend, requires language	—
`funasr`	Experimental FunASR backend	—
`openai_api`	Placeholder, not wired yet	Not available

Selecting a Backend

Pass the engine name as the transcription_engine argument. The default is faster_whisper when the argument is omitted.

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    model="small.en",
    transcription_engine="faster_whisper",
)

Different Engines for Final and Realtime

You can assign a lightweight engine to realtime transcription and a higher-quality engine to the final pass. This keeps the realtime display responsive while preserving accuracy on the committed result.

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="faster_whisper",
    model="small.en",
    enable_realtime_transcription=True,
    realtime_transcription_engine="whisper_cpp",
    realtime_model_type="tiny.en",
    realtime_transcription_engine_options={
        "model": {"n_threads": 8},
        "transcribe": {"single_segment": True, "no_context": True},
    },
)

If realtime_transcription_engine is None, realtime transcription uses the same backend as transcription_engine.

Engine-Specific Options

Use transcription_engine_options and realtime_transcription_engine_options to pass backend-specific dictionaries. These are intentionally per-engine — a key that is meaningful for one backend may be ignored or invalid for another.

recorder = AudioToTextRecorder(
    transcription_engine="sherpa_onnx_moonshine",
    model="models/sherpa-onnx-moonshine-tiny-en-int8",
    device="cpu",
    language="en",
    transcription_engine_options={
        "num_threads": 2,
        "provider": "cpu",
    },
)

Model Download Behavior

Engine family	Automatic download	Manual placement
`faster_whisper`	Yes, for known Hugging Face/CTranslate2 model ids.	Local CTranslate2 model directories may be passed as `model`.
`whisper_cpp`	Usually yes for model names supported by `pywhispercpp`.	Local ggml model paths or `download_root`/`models_dir` may be used.
`openai_whisper`	Yes, through `openai-whisper`.	Local model names/paths supported by that package.
`moonshine`, `granite_speech`, `qwen3_asr`, `cohere_transcribe`	Yes, through Hugging Face or the engine package, subject to access.	`download_root` maps to cache options where supported.
`parakeet` NeMo	Yes, through NeMo model loading.	NeMo cache/model options may be passed in `transcription_engine_options`.
`omnilingual_asr`	Yes, through Omnilingual/fairseq2/Hugging Face cache paths in Linux or WSL2 with Python 3.11.x.	Pass an Omnilingual model card such as `omniASR_CTC_1B_v2`.
`sherpa_onnx_*`	No.	Download and extract the sherpa-onnx model bundle, then pass the extracted directory.
`kroko_onnx`	Yes, for known public Community `.data` files when enabled.	Pro/private models need an existing `.data` path, direct URL, or explicit repo/token options.

Extending Engines

RealtimeSTT’s engine system is designed to be extended. New engines should:

Implement BaseTranscriptionEngine from .base.
Return a TranscriptionResult (with a TranscriptionInfo field) from the transcribe method.
Be registered in RealtimeSTT/transcription_engines/factory.py under a normalized name string.

Imports should remain lazy inside the engine module so that optional dependencies are only imported when that engine is actually selected. The factory uses importlib.import_module to load engine classes on demand:

# factory.py excerpt — how engines are registered
ENGINE_CLASS_PATHS = {
    "faster_whisper": (".faster_whisper_engine", "FasterWhisperEngine"),
    "whisper_cpp":    (".whisper_cpp_engine",    "WhisperCppEngine"),
    # ... other engines
}

Contract tests should cover missing dependency messages, parameter mapping, audio normalization, result conversion, and factory selection. Tests that load real models should remain opt-in.

Get Started

Guides

Transcription Engines

Resources

RealtimeSTT Transcription Engines: Comparison and Selection

Choosing an Engine

Supported Engine Names

Selecting a Backend

Different Engines for Final and Realtime

Engine-Specific Options

Model Download Behavior

Extending Engines

Build docs developers (and LLMs) love

Get Started

Guides

Transcription Engines

Resources

Documentation Index

​Choosing an Engine

​Supported Engine Names

​Selecting a Backend

​Different Engines for Final and Realtime

​Engine-Specific Options

​Model Download Behavior

​Extending Engines

Build docs developers (and LLMs) love

Choosing an Engine

Supported Engine Names

Selecting a Backend

Different Engines for Final and Realtime

Engine-Specific Options

Model Download Behavior

Extending Engines