Documentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
AudioToTextRecorder is the primary entry point into RealtimeSTT. A single instance manages the full pipeline: opening the audio input stream (or accepting externally fed audio), running voice activity detection (VAD) to segment speech, buffering pre-roll audio, coordinating the main and realtime transcription engines, and delivering results either synchronously or through callbacks. All threads are owned and managed internally — your application interacts with the class through a handful of well-defined methods.
Always protect the recorder instantiation with an
if __name__ == "__main__": guard. RealtimeSTT starts background processes using Python’s multiprocessing module, and skipping this guard on Windows or in certain shell environments causes processes to spawn recursively.Constructor
Import the class and create an instance with the parameters that match your use case. All parameters are optional; the defaults are designed for the quickest possible out-of-the-box experience.use_microphone=True. Initialization is synchronous: by the time __init__ returns the recorder is fully operational.
See Configuration for the complete parameter reference, and Callbacks for every event callback that can be passed at construction time.
text()
post_speech_silence_duration, performs final transcription, applies any text-formatting rules, and then returns.
Optional callback. When supplied the call returns immediately (non-blocking) and the callback is invoked with the transcript string once it is ready. Use this pattern in continuous dictation loops so that the main thread can call
text() again without waiting.Blocking usage
Callingtext() without a callback blocks the calling thread until a full utterance has been transcribed. This is the simplest pattern for one-shot or sequential transcription.
Callback / dictation loop
Pass a callback to run a continuous dictation loop without blocking the main thread between utterances. The callback is called from a recorder-internal thread (or a new thread ifstart_callback_in_new_thread=True).
feed_audio()
use_microphone=False; calling it while the microphone stream is active has no effect.
Raw audio data as a
bytes object. Expected format: 16-bit signed PCM, mono channel, little-endian. Chunks can be any length; the recorder buffers and processes them in order.The sample rate of the supplied chunk. When
16000, no resampling is applied. Any other value causes the chunk to be resampled to 16 kHz before it enters the VAD/transcription pipeline.feed_audio() does not reorder or timestamp chunks.
shutdown()
shutdown() returns, the recorder object should not be used again. Any in-progress text() call will unblock and return an empty string.
Context Manager
AudioToTextRecorder implements the context manager protocol. Using with is the preferred pattern because shutdown() is called automatically on exit, even if an exception is raised inside the block.
__exit__ → shutdown() regardless of whether the block exits normally or via an exception, making it safe to use in scripts and services alike.
Thread Safety
AudioToTextRecorder is not fully thread-safe for concurrent text() calls. The intended usage model is a single producer/consumer loop on one thread.
Callbacks, however, are invoked from the recorder’s internal threads by default. If a callback performs I/O, updates UI state, or does anything that may block, set start_callback_in_new_thread=True at construction time so that the recorder flow is not stalled:
