RealtimeSTT requires Python 3.11 or newer and uses a pip extras system so each environment installs only the transcription engines and wake-word backends it actually needs. The core package includes microphone support, WebRTC VAD, recorder logic, and websocket utilities — you layer on transcription and VAD backends from there. This page walks through the recommended install flow, lists every available extra, covers platform-specific requirements, and explains CUDA setup for low-latency inference.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
Recommended Install Flow
Create a virtual environment
Always isolate RealtimeSTT in a dedicated virtual environment to avoid
dependency conflicts with other Python projects.
- Linux / macOS
- Windows (PowerShell)
Install RealtimeSTT with your chosen extras
The For the default local transcription stack plus a self-contained Silero VAD
backend (no Torch Hub downloads at runtime):To add a wake-word backend alongside transcription:
faster-whisper extra is the recommended default for local Whisper
transcription. Combine extras with commas when you need more than one
backend.Always wrap runnable scripts in an
if __name__ == "__main__": guard.
RealtimeSTT spawns worker processes for model inference, and Python’s
multiprocessing module requires this guard to start child processes safely —
especially on Windows, where the default start method is spawn rather than
fork.Install Extras Reference
Extras can be combined freely using commas:| Extra | Installs | Use When |
|---|---|---|
faster-whisper | faster-whisper | Default CTranslate2 Whisper backend. Best general-purpose choice for local inference. |
whisper-cpp / whispercpp | pywhispercpp | whisper.cpp backend. Good for CPU-focused setups with no CUDA. |
openai-whisper | openai-whisper | Original OpenAI Whisper Python backend. |
sherpa-onnx / sherpa | sherpa-onnx | CPU INT8 sherpa-onnx backend for Parakeet or Moonshine engines without CUDA. |
silero-vad / silero | silero-vad>=6.2.1 | Silero VAD with PyTorch wrapper. Uses Torch Hub for model download. |
silero-onnx-cpu / silero-onnx / vad-onnx | silero-vad[onnx-cpu]>=6.2.1 | Fastest self-contained Silero VAD on CPU via ONNX Runtime. No Torch Hub required. |
silero-onnx-gpu / vad-onnx-gpu | silero-vad[onnx-gpu]>=6.2.1 | Silero VAD via ONNX Runtime with GPU acceleration. |
transformers | transformers | Shared Transformers dependency for Moonshine, Granite, and Cohere engines. |
moonshine / granite / cohere | transformers | Convenience aliases that pull in the Transformers dependency set for the corresponding engine. |
parakeet / nvidia-parakeet | nemo_toolkit[asr] | NVIDIA NeMo Parakeet backend. Best on Linux or WSL2. |
omnilingual / omnilingual-asr / meta-omnilingual-asr | omnilingual-asr>=0.2.0, matching torch==2.8.0 / torchaudio==2.8.0 | Meta Omnilingual ASR. Linux or WSL2 with Python 3.11.x only — native Windows has no fairseq2n wheel and Python 3.12.x cannot resolve the package. |
qwen / qwen3-asr | qwen-asr | Qwen ASR backend. |
qwen-vllm | qwen-asr[vllm] | Qwen ASR with vLLM acceleration. |
kroko-builder | RealtimeSTT builder helper, huggingface_hub | Exposes stt-install-kroko, which builds and installs Kroko-ONNX from upstream and downloads public Community models. On Windows requires CPython 3.12 x64 and a running Docker Desktop engine. |
porcupine / pvporcupine / pvp | pvporcupine | Porcupine wake-word backend. Required when setting wake_words without an explicit wakeword_backend. |
openwakeword / oww | openwakeword | OpenWakeWord wake-word backend. Use with wakeword_backend="oww". |
wakewords / wake-words | pvporcupine, openwakeword | Installs both wake-word backends in one extra. |
recommended / default | faster-whisper, silero-vad[onnx-cpu] | Default local transcription plus the fast CPU ONNX Silero VAD path. Best starting point for most installs. |
minimal | (none) | Core package only, with no transcription engine or wake-word backend. |
all | All PyPI-installable optional backends | Broad development or experimentation environments where you want every backend available. |
Platform Notes
- Linux
- macOS
- Windows
PyAudio requires PortAudio headers at build time. Install the system
packages before running Some examples and tests also use
pip install:ffmpeg and libsndfile:CUDA Setup
RealtimeSTT can run on CPU with small models, but CUDA is strongly preferred for low-latency realtime transcription with larger Whisper-family models. To use CUDA, install a matching NVIDIA driver, CUDA runtime, and cuDNN version for your PyTorch build. Install a CUDA-enabled PyTorch and torchaudio before installing RealtimeSTT so the CUDA wheels are selected:cu121 index URL to match your installed CUDA version — use the PyTorch install selector for the exact command. Keep device="cuda" in your AudioToTextRecorder constructor for GPU inference, or device="cpu" for CPU-only engine stacks.
VAD Dependencies
WebRTC VAD is installed automatically with the core package and requires no extra. TheAudioToTextRecorder also initialises a Silero VAD path for more accurate speech boundary detection. By default, Silero downloads its model at runtime through Torch Hub; to avoid that download and use a fully self-contained local backend instead, install the silero-onnx-cpu extra (or the recommended / default meta-extra, which includes it):
wake_words parameter without also setting wakeword_backend, RealtimeSTT defaults to Porcupine for backward compatibility — so install the porcupine extra in that case:
