Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt

Use this file to discover all available pages before exploring further.

RealtimeSTT is a Python library that converts speech to text with low latency, voice activity detection, optional wake word activation, and support for a wide range of transcription backends — from local Whisper models to streaming ONNX engines. It is designed for voice assistants, dictation tools, browser streaming servers, and any application that needs fast, reliable speech recognition.

Quickstart

Get from zero to working speech-to-text in under five minutes.

Installation

Install the right extras for your platform and engine stack.

Configuration

Full parameter reference for AudioToTextRecorder.

Transcription Engines

Compare all supported backends and choose the right one.

Why RealtimeSTT?

RealtimeSTT handles the hard parts of production speech recognition: detecting when someone starts and stops speaking, buffering pre-roll audio so the first word is never clipped, running interim transcription updates while speech is still in progress, and routing through the engine backend that best fits your hardware and latency requirements.

Voice Activity Detection

Dual-layer VAD with WebRTC and Silero. Detects speech start/stop with minimal false positives.

Multiple Engines

faster-whisper, whisper.cpp, Kroko-ONNX, sherpa-onnx, Parakeet, and more.

Wake Words

Activate recording only after a trigger phrase using Porcupine or OpenWakeWord.

External Audio

Feed audio from files, websockets, or any stream instead of the microphone.

Get Started in Seconds

1

Install the library

Install RealtimeSTT with the default faster-whisper backend:
pip install "RealtimeSTT[faster-whisper]"
2

Write your first script

Create a Python script with the if __name__ == "__main__": guard (required for multiprocessing on Windows):
from RealtimeSTT import AudioToTextRecorder

if __name__ == "__main__":
    with AudioToTextRecorder() as recorder:
        print("Speak now...")
        print(recorder.text())
3

Run it

Speak into your microphone. RealtimeSTT detects your voice, waits for silence, then prints the transcription.
4

Explore further

See the Quickstart guide for continuous dictation, real-time interim text, and more patterns.

Explore the Docs

Guides

Practical patterns for common use cases.

API Reference

Complete class and parameter documentation.

Troubleshooting

Fix common install, audio, and runtime issues.

Build docs developers (and LLMs) love