Install RealtimeSTT and Choose Your Engine Stack

RealtimeSTT requires Python 3.11 or newer and uses a pip extras system so each environment installs only the transcription engines and wake-word backends it actually needs. The core package includes microphone support, WebRTC VAD, recorder logic, and websocket utilities — you layer on transcription and VAD backends from there. This page walks through the recommended install flow, lists every available extra, covers platform-specific requirements, and explains CUDA setup for low-latency inference.

Recommended Install Flow

Create a virtual environment

Always isolate RealtimeSTT in a dedicated virtual environment to avoid dependency conflicts with other Python projects.

Linux / macOS
Windows (PowerShell)

python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip setuptools wheel

python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -U pip setuptools wheel

Install RealtimeSTT with your chosen extras

The faster-whisper extra is the recommended default for local Whisper transcription. Combine extras with commas when you need more than one backend.

pip install "RealtimeSTT[faster-whisper]"

For the default local transcription stack plus a self-contained Silero VAD backend (no Torch Hub downloads at runtime):

pip install "RealtimeSTT[recommended]"

To add a wake-word backend alongside transcription:

pip install "RealtimeSTT[faster-whisper,porcupine]"

Verify the install

Run this snippet to confirm the recorder initialises and can capture audio. Speak into your microphone when prompted.

from RealtimeSTT import AudioToTextRecorder

if __name__ == "__main__":
    with AudioToTextRecorder(model="tiny", device="cpu") as recorder:
        print("Speak now")
        print(recorder.text())

A transcript of your speech should appear. If initialisation errors appear, check the platform notes and CUDA sections below.

Always wrap runnable scripts in an if __name__ == "__main__": guard. RealtimeSTT spawns worker processes for model inference, and Python’s multiprocessing module requires this guard to start child processes safely — especially on Windows, where the default start method is spawn rather than fork.

Install Extras Reference

Extras can be combined freely using commas:

pip install "RealtimeSTT[faster-whisper,porcupine]"
pip install "RealtimeSTT[whisper-cpp,openwakeword]"
pip install "RealtimeSTT[kroko-builder,silero-onnx-cpu]"

Extra	Installs	Use When
`faster-whisper`	`faster-whisper`	Default CTranslate2 Whisper backend. Best general-purpose choice for local inference.
`whisper-cpp` / `whispercpp`	`pywhispercpp`	whisper.cpp backend. Good for CPU-focused setups with no CUDA.
`openai-whisper`	`openai-whisper`	Original OpenAI Whisper Python backend.
`sherpa-onnx` / `sherpa`	`sherpa-onnx`	CPU INT8 sherpa-onnx backend for Parakeet or Moonshine engines without CUDA.
`silero-vad` / `silero`	`silero-vad>=6.2.1`	Silero VAD with PyTorch wrapper. Uses Torch Hub for model download.
`silero-onnx-cpu` / `silero-onnx` / `vad-onnx`	`silero-vad[onnx-cpu]>=6.2.1`	Fastest self-contained Silero VAD on CPU via ONNX Runtime. No Torch Hub required.
`silero-onnx-gpu` / `vad-onnx-gpu`	`silero-vad[onnx-gpu]>=6.2.1`	Silero VAD via ONNX Runtime with GPU acceleration.
`transformers`	`transformers`	Shared Transformers dependency for Moonshine, Granite, and Cohere engines.
`moonshine` / `granite` / `cohere`	`transformers`	Convenience aliases that pull in the Transformers dependency set for the corresponding engine.
`parakeet` / `nvidia-parakeet`	`nemo_toolkit[asr]`	NVIDIA NeMo Parakeet backend. Best on Linux or WSL2.
`omnilingual` / `omnilingual-asr` / `meta-omnilingual-asr`	`omnilingual-asr>=0.2.0`, matching `torch==2.8.0` / `torchaudio==2.8.0`	Meta Omnilingual ASR. Linux or WSL2 with Python 3.11.x only — native Windows has no `fairseq2n` wheel and Python 3.12.x cannot resolve the package.
`qwen` / `qwen3-asr`	`qwen-asr`	Qwen ASR backend.
`qwen-vllm`	`qwen-asr[vllm]`	Qwen ASR with vLLM acceleration.
`kroko-builder`	RealtimeSTT builder helper, `huggingface_hub`	Exposes `stt-install-kroko`, which builds and installs Kroko-ONNX from upstream and downloads public Community models. On Windows requires CPython 3.12 x64 and a running Docker Desktop engine.
`porcupine` / `pvporcupine` / `pvp`	`pvporcupine`	Porcupine wake-word backend. Required when setting `wake_words` without an explicit `wakeword_backend`.
`openwakeword` / `oww`	`openwakeword`	OpenWakeWord wake-word backend. Use with `wakeword_backend="oww"`.
`wakewords` / `wake-words`	`pvporcupine`, `openwakeword`	Installs both wake-word backends in one extra.
`recommended` / `default`	`faster-whisper`, `silero-vad[onnx-cpu]`	Default local transcription plus the fast CPU ONNX Silero VAD path. Best starting point for most installs.
`minimal`	(none)	Core package only, with no transcription engine or wake-word backend.
`all`	All PyPI-installable optional backends	Broad development or experimentation environments where you want every backend available.

Platform Notes

Linux
macOS
Windows

PyAudio requires PortAudio headers at build time. Install the system packages before running pip install:

sudo apt-get update
sudo apt-get install python3-dev portaudio19-dev
pip install "RealtimeSTT[faster-whisper]"

Some examples and tests also use ffmpeg and libsndfile:

sudo apt-get install ffmpeg libsndfile1

Install PortAudio through Homebrew before installing PyAudio:

brew install portaudio
pip install "RealtimeSTT[faster-whisper]"

Install from a normal terminal or PowerShell session. No compiler is required for the default stack because the project uses webrtcvad-wheels to ship a pre-built WebRTC VAD wheel, avoiding the older source-only install path:

pip install "RealtimeSTT[faster-whisper]"

Meta Omnilingual ASR is not available for native Windows inference — the omnilingual extra is guarded by a non-Windows package marker because fairseq2n has no Windows wheel. Use Linux or WSL2 with Python 3.11.x for that engine.

CUDA Setup

RealtimeSTT can run on CPU with small models, but CUDA is strongly preferred for low-latency realtime transcription with larger Whisper-family models. To use CUDA, install a matching NVIDIA driver, CUDA runtime, and cuDNN version for your PyTorch build. Install a CUDA-enabled PyTorch and torchaudio before installing RealtimeSTT so the CUDA wheels are selected:

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install "RealtimeSTT[faster-whisper]"

Adjust the cu121 index URL to match your installed CUDA version — use the PyTorch install selector for the exact command. Keep device="cuda" in your AudioToTextRecorder constructor for GPU inference, or device="cpu" for CPU-only engine stacks.

VAD Dependencies

WebRTC VAD is installed automatically with the core package and requires no extra. The AudioToTextRecorder also initialises a Silero VAD path for more accurate speech boundary detection. By default, Silero downloads its model at runtime through Torch Hub; to avoid that download and use a fully self-contained local backend instead, install the silero-onnx-cpu extra (or the recommended / default meta-extra, which includes it):

pip install "RealtimeSTT[silero-onnx-cpu]"

Wake-word dependencies are entirely optional. If you do not use wake words, install no wake-word extra. If you set the wake_words parameter without also setting wakeword_backend, RealtimeSTT defaults to Porcupine for backward compatibility — so install the porcupine extra in that case:

pip install "RealtimeSTT[faster-whisper,porcupine]"

Get Started

Guides

Transcription Engines

Resources

Install RealtimeSTT and Choose Your Engine Stack

Recommended Install Flow

Install Extras Reference

Platform Notes

CUDA Setup

VAD Dependencies

Build docs developers (and LLMs) love

Get Started

Guides

Transcription Engines

Resources

Documentation Index

​Recommended Install Flow

​Install Extras Reference

​Platform Notes

​CUDA Setup

​VAD Dependencies

Build docs developers (and LLMs) love

Recommended Install Flow

Install Extras Reference

Platform Notes

CUDA Setup

VAD Dependencies