This guide covers the four main usage patterns you will reach for when building with RealtimeSTT: capturing a single utterance, running a continuous dictation loop, streaming real-time interim text, and feeding audio from an external source instead of a microphone. Each pattern builds on the previous one, so read through in order if you are new to the library.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
Single Utterance
Install RealtimeSTT
Install the library with the
faster-whisper backend, which is the
recommended default for local Whisper transcription:Speak into your microphone
Open the recorder with a context manager. RealtimeSTT starts listening for
voice activity immediately. Call
text() and it blocks until a full
utterance is detected and transcribed:Continuous Dictation Loop
For applications that need to keep listening across multiple utterances, pass a callback totext() instead of collecting the return value directly:
text() dispatches the transcript to process_text asynchronously as transcription finishes, and returns immediately so the while True loop can resume listening right away. This is the preferred form for continuous dictation because the recorder keeps buffering incoming audio during the brief transcription window — you lose no speech between utterances. Without a callback, text() blocks until the transcript is ready and the loop pauses.
Real-time Interim Transcription
Enableenable_realtime_transcription to receive live text updates while the user is still speaking. A fast, lightweight model handles the interim updates; a larger, more accurate model produces the final result once the utterance ends:
on_realtime_transcription_update fires repeatedly as each new interim chunk is produced. text() still returns (or delivers to a callback) the single authoritative final transcript once the utterance is complete. Using a smaller realtime_model_type than model keeps the interim updates fast without sacrificing accuracy in the final result.
External Audio (No Microphone)
Setuse_microphone=False when audio arrives from a file, websocket, process pipeline, or any other non-microphone source. Feed raw 16-bit mono PCM chunks with feed_audio():
original_sample_rate when your source audio is not already at 16 kHz — RealtimeSTT resamples to 16 kHz internally before processing. When not using the context manager, call recorder.shutdown() explicitly to release audio and model resources.
Context Manager vs. Manual Shutdown
Both forms are equivalent; choose based on whether the recorder lifetime matches a single code block. Context manager — use when the recorder starts and stops in the same block. Shutdown is automatic and exception-safe:start() and stop() let your application explicitly control when recording begins and ends, rather than relying on automatic VAD-triggered onset detection.
Common Configuration Parameters
The table below covers the parameters most useful during initial development. A full reference for every constructor parameter is in the configuration guide.| Parameter | Default | Effect |
|---|---|---|
model | "tiny" | Whisper model size for final transcription (tiny, base, small, medium, large-v2, etc.). Smaller models are faster but less accurate. |
language | "" (auto-detect) | ISO 639-1 language code (e.g. "en", "de", "fr"). Set explicitly to skip auto-detection overhead. |
enable_realtime_transcription | False | Enables live interim text updates via on_realtime_transcription_update. |
post_speech_silence_duration | 0.6 | Seconds of silence after speech ends before the utterance is considered complete and transcription begins. Lower values feel more responsive; higher values reduce false end-of-speech cuts. |
silero_sensitivity | 0.4 | Silero VAD sensitivity (0.0–1.0). Higher values require more confident speech to trigger recording; lower values are more permissive. |
transcription_engine | "faster_whisper" | Selects the transcription backend. Other values include "whisper_cpp", "openai_whisper", "moonshine", "kroko_onnx", etc. |
