Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt

Use this file to discover all available pages before exploring further.

RealtimeSTT requires Python 3.11 or newer and uses a pip extras system so each environment installs only the transcription engines and wake-word backends it actually needs. The core package includes microphone support, WebRTC VAD, recorder logic, and websocket utilities — you layer on transcription and VAD backends from there. This page walks through the recommended install flow, lists every available extra, covers platform-specific requirements, and explains CUDA setup for low-latency inference.
1

Create a virtual environment

Always isolate RealtimeSTT in a dedicated virtual environment to avoid dependency conflicts with other Python projects.
python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip setuptools wheel
2

Install RealtimeSTT with your chosen extras

The faster-whisper extra is the recommended default for local Whisper transcription. Combine extras with commas when you need more than one backend.
pip install "RealtimeSTT[faster-whisper]"
For the default local transcription stack plus a self-contained Silero VAD backend (no Torch Hub downloads at runtime):
pip install "RealtimeSTT[recommended]"
To add a wake-word backend alongside transcription:
pip install "RealtimeSTT[faster-whisper,porcupine]"
3

Verify the install

Run this snippet to confirm the recorder initialises and can capture audio. Speak into your microphone when prompted.
from RealtimeSTT import AudioToTextRecorder

if __name__ == "__main__":
    with AudioToTextRecorder(model="tiny", device="cpu") as recorder:
        print("Speak now")
        print(recorder.text())
A transcript of your speech should appear. If initialisation errors appear, check the platform notes and CUDA sections below.
Always wrap runnable scripts in an if __name__ == "__main__": guard. RealtimeSTT spawns worker processes for model inference, and Python’s multiprocessing module requires this guard to start child processes safely — especially on Windows, where the default start method is spawn rather than fork.

Install Extras Reference

Extras can be combined freely using commas:
pip install "RealtimeSTT[faster-whisper,porcupine]"
pip install "RealtimeSTT[whisper-cpp,openwakeword]"
pip install "RealtimeSTT[kroko-builder,silero-onnx-cpu]"
ExtraInstallsUse When
faster-whisperfaster-whisperDefault CTranslate2 Whisper backend. Best general-purpose choice for local inference.
whisper-cpp / whispercpppywhispercppwhisper.cpp backend. Good for CPU-focused setups with no CUDA.
openai-whisperopenai-whisperOriginal OpenAI Whisper Python backend.
sherpa-onnx / sherpasherpa-onnxCPU INT8 sherpa-onnx backend for Parakeet or Moonshine engines without CUDA.
silero-vad / silerosilero-vad>=6.2.1Silero VAD with PyTorch wrapper. Uses Torch Hub for model download.
silero-onnx-cpu / silero-onnx / vad-onnxsilero-vad[onnx-cpu]>=6.2.1Fastest self-contained Silero VAD on CPU via ONNX Runtime. No Torch Hub required.
silero-onnx-gpu / vad-onnx-gpusilero-vad[onnx-gpu]>=6.2.1Silero VAD via ONNX Runtime with GPU acceleration.
transformerstransformersShared Transformers dependency for Moonshine, Granite, and Cohere engines.
moonshine / granite / coheretransformersConvenience aliases that pull in the Transformers dependency set for the corresponding engine.
parakeet / nvidia-parakeetnemo_toolkit[asr]NVIDIA NeMo Parakeet backend. Best on Linux or WSL2.
omnilingual / omnilingual-asr / meta-omnilingual-asromnilingual-asr>=0.2.0, matching torch==2.8.0 / torchaudio==2.8.0Meta Omnilingual ASR. Linux or WSL2 with Python 3.11.x only — native Windows has no fairseq2n wheel and Python 3.12.x cannot resolve the package.
qwen / qwen3-asrqwen-asrQwen ASR backend.
qwen-vllmqwen-asr[vllm]Qwen ASR with vLLM acceleration.
kroko-builderRealtimeSTT builder helper, huggingface_hubExposes stt-install-kroko, which builds and installs Kroko-ONNX from upstream and downloads public Community models. On Windows requires CPython 3.12 x64 and a running Docker Desktop engine.
porcupine / pvporcupine / pvppvporcupinePorcupine wake-word backend. Required when setting wake_words without an explicit wakeword_backend.
openwakeword / owwopenwakewordOpenWakeWord wake-word backend. Use with wakeword_backend="oww".
wakewords / wake-wordspvporcupine, openwakewordInstalls both wake-word backends in one extra.
recommended / defaultfaster-whisper, silero-vad[onnx-cpu]Default local transcription plus the fast CPU ONNX Silero VAD path. Best starting point for most installs.
minimal(none)Core package only, with no transcription engine or wake-word backend.
allAll PyPI-installable optional backendsBroad development or experimentation environments where you want every backend available.

Platform Notes

PyAudio requires PortAudio headers at build time. Install the system packages before running pip install:
sudo apt-get update
sudo apt-get install python3-dev portaudio19-dev
pip install "RealtimeSTT[faster-whisper]"
Some examples and tests also use ffmpeg and libsndfile:
sudo apt-get install ffmpeg libsndfile1

CUDA Setup

RealtimeSTT can run on CPU with small models, but CUDA is strongly preferred for low-latency realtime transcription with larger Whisper-family models. To use CUDA, install a matching NVIDIA driver, CUDA runtime, and cuDNN version for your PyTorch build. Install a CUDA-enabled PyTorch and torchaudio before installing RealtimeSTT so the CUDA wheels are selected:
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install "RealtimeSTT[faster-whisper]"
Adjust the cu121 index URL to match your installed CUDA version — use the PyTorch install selector for the exact command. Keep device="cuda" in your AudioToTextRecorder constructor for GPU inference, or device="cpu" for CPU-only engine stacks.

VAD Dependencies

WebRTC VAD is installed automatically with the core package and requires no extra. The AudioToTextRecorder also initialises a Silero VAD path for more accurate speech boundary detection. By default, Silero downloads its model at runtime through Torch Hub; to avoid that download and use a fully self-contained local backend instead, install the silero-onnx-cpu extra (or the recommended / default meta-extra, which includes it):
pip install "RealtimeSTT[silero-onnx-cpu]"
Wake-word dependencies are entirely optional. If you do not use wake words, install no wake-word extra. If you set the wake_words parameter without also setting wakeword_backend, RealtimeSTT defaults to Porcupine for backward compatibility — so install the porcupine extra in that case:
pip install "RealtimeSTT[faster-whisper,porcupine]"

Build docs developers (and LLMs) love