RealtimeSTT: Low-Latency Speech-to-Text for Python

RealtimeSTT is a Python library that converts speech to text with low latency, voice activity detection, optional wake word activation, and support for a wide range of transcription backends — from local Whisper models to streaming ONNX engines. It is designed for voice assistants, dictation tools, browser streaming servers, and any application that needs fast, reliable speech recognition.

Quickstart

Get from zero to working speech-to-text in under five minutes.

Installation

Install the right extras for your platform and engine stack.

Configuration

Full parameter reference for AudioToTextRecorder.

Transcription Engines

Compare all supported backends and choose the right one.

Why RealtimeSTT?

RealtimeSTT handles the hard parts of production speech recognition: detecting when someone starts and stops speaking, buffering pre-roll audio so the first word is never clipped, running interim transcription updates while speech is still in progress, and routing through the engine backend that best fits your hardware and latency requirements.

Voice Activity Detection

Dual-layer VAD with WebRTC and Silero. Detects speech start/stop with minimal false positives.

Multiple Engines

faster-whisper, whisper.cpp, Kroko-ONNX, sherpa-onnx, Parakeet, and more.

Wake Words

Activate recording only after a trigger phrase using Porcupine or OpenWakeWord.

External Audio

Feed audio from files, websockets, or any stream instead of the microphone.

Get Started in Seconds

Install the library

Install RealtimeSTT with the default faster-whisper backend:

pip install "RealtimeSTT[faster-whisper]"

Write your first script

Create a Python script with the if __name__ == "__main__": guard (required for multiprocessing on Windows):

from RealtimeSTT import AudioToTextRecorder

if __name__ == "__main__":
    with AudioToTextRecorder() as recorder:
        print("Speak now...")
        print(recorder.text())

Run it

Speak into your microphone. RealtimeSTT detects your voice, waits for silence, then prints the transcription.

Explore further

See the Quickstart guide for continuous dictation, real-time interim text, and more patterns.

Explore the Docs

Guides

Practical patterns for common use cases.

API Reference

Complete class and parameter documentation.

Troubleshooting

Fix common install, audio, and runtime issues.

Get Started

Guides

Transcription Engines

Resources

RealtimeSTT: Low-Latency Speech-to-Text for Python

Quickstart

Installation

Configuration

Transcription Engines

Why RealtimeSTT?

Voice Activity Detection

Multiple Engines

Wake Words

External Audio

Get Started in Seconds

Explore the Docs

Guides

API Reference

Troubleshooting

Build docs developers (and LLMs) love

Get Started

Guides

Transcription Engines

Resources

Documentation Index

Quickstart

Installation

Configuration

Transcription Engines

​Why RealtimeSTT?

Voice Activity Detection

Multiple Engines

Wake Words

External Audio

​Get Started in Seconds

​Explore the Docs

Guides

API Reference

Troubleshooting

Build docs developers (and LLMs) love

Why RealtimeSTT?

Get Started in Seconds

Explore the Docs