AudioToTextRecorder: Main RealtimeSTT Class Reference

AudioToTextRecorder is the primary entry point into RealtimeSTT. A single instance manages the full pipeline: opening the audio input stream (or accepting externally fed audio), running voice activity detection (VAD) to segment speech, buffering pre-roll audio, coordinating the main and realtime transcription engines, and delivering results either synchronously or through callbacks. All threads are owned and managed internally — your application interacts with the class through a handful of well-defined methods.

Always protect the recorder instantiation with an if __name__ == "__main__": guard. RealtimeSTT starts background processes using Python’s multiprocessing module, and skipping this guard on Windows or in certain shell environments causes processes to spawn recursively.

if __name__ == "__main__":
    recorder = AudioToTextRecorder()
    print(recorder.text())
    recorder.shutdown()

Constructor

Import the class and create an instance with the parameters that match your use case. All parameters are optional; the defaults are designed for the quickest possible out-of-the-box experience.

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    model="small.en",
    language="en",
    enable_realtime_transcription=True,
)

The constructor immediately initializes the selected transcription engine (downloading the model if necessary), warms up VAD, and begins listening for speech if use_microphone=True. Initialization is synchronous: by the time __init__ returns the recorder is fully operational. See Configuration for the complete parameter reference, and Callbacks for every event callback that can be passed at construction time.

text()

recorder.text(on_transcription_finished=None) -> str

Waits for one complete utterance and returns the transcript as a plain string. The method drives the full recording lifecycle: it listens for speech onset, records until post-speech silence crosses post_speech_silence_duration, performs final transcription, applies any text-formatting rules, and then returns.

on_transcription_finished

callable | None

default:"None"

Optional callback. When supplied the call returns immediately (non-blocking) and the callback is invoked with the transcript string once it is ready. Use this pattern in continuous dictation loops so that the main thread can call text() again without waiting.

Blocking usage

Calling text() without a callback blocks the calling thread until a full utterance has been transcribed. This is the simplest pattern for one-shot or sequential transcription.

from RealtimeSTT import AudioToTextRecorder

if __name__ == "__main__":
    recorder = AudioToTextRecorder()
    print("Say something...")
    result = recorder.text()   # blocks here
    print("You said:", result)
    recorder.shutdown()

Callback / dictation loop

Pass a callback to run a continuous dictation loop without blocking the main thread between utterances. The callback is called from a recorder-internal thread (or a new thread if start_callback_in_new_thread=True).

from RealtimeSTT import AudioToTextRecorder

def handle_text(text):
    print("Transcribed:", text)

if __name__ == "__main__":
    with AudioToTextRecorder() as recorder:
        while True:
            recorder.text(on_transcription_finished=handle_text)

feed_audio()

recorder.feed_audio(chunk, original_sample_rate=16000)

Submits a raw audio chunk to the recorder’s internal input queue. This method is only valid when the recorder was created with use_microphone=False; calling it while the microphone stream is active has no effect.

chunk

bytes

required

Raw audio data as a bytes object. Expected format: 16-bit signed PCM, mono channel, little-endian. Chunks can be any length; the recorder buffers and processes them in order.

original_sample_rate

int

default:"16000"

The sample rate of the supplied chunk. When 16000, no resampling is applied. Any other value causes the chunk to be resampled to 16 kHz before it enters the VAD/transcription pipeline.

from RealtimeSTT import AudioToTextRecorder

CHUNK_BYTES = 3200  # 100 ms of 16 kHz 16-bit mono PCM

if __name__ == "__main__":
    recorder = AudioToTextRecorder(use_microphone=False)

    with open("speech.pcm", "rb") as f:
        while True:
            chunk = f.read(CHUNK_BYTES)
            if not chunk:
                break
            recorder.feed_audio(chunk, original_sample_rate=16000)

    print(recorder.text())
    recorder.shutdown()

Feed smaller, time-aligned chunks (roughly 20–200 ms) so that VAD and realtime updates can react with low latency. Preserve ordering — feed_audio() does not reorder or timestamp chunks.

For browser or telephony sources that deliver audio at 44.1 kHz or 48 kHz, pass the original sample rate and let RealtimeSTT handle resampling internally. No preprocessing is needed on your side.

shutdown()

recorder.shutdown()

Stops all internal threads: the audio capture thread, the VAD worker, the realtime transcription worker, and the final transcription pool. After shutdown() returns, the recorder object should not be used again. Any in-progress text() call will unblock and return an empty string.

recorder = AudioToTextRecorder()
try:
    result = recorder.text()
    print(result)
finally:
    recorder.shutdown()

Omitting shutdown() (or failing to exit a with block) may leave daemon threads running until the Python process exits. In long-running services this can accumulate resources over repeated recorder instantiations.

Context Manager

AudioToTextRecorder implements the context manager protocol. Using with is the preferred pattern because shutdown() is called automatically on exit, even if an exception is raised inside the block.

from RealtimeSTT import AudioToTextRecorder

if __name__ == "__main__":
    with AudioToTextRecorder(model="small.en") as recorder:
        print(recorder.text())
        print(recorder.text())
    # shutdown() has already been called here

The context manager calls __exit__ → shutdown() regardless of whether the block exits normally or via an exception, making it safe to use in scripts and services alike.

Thread Safety

AudioToTextRecorder is not fully thread-safe for concurrent text() calls. The intended usage model is a single producer/consumer loop on one thread. Callbacks, however, are invoked from the recorder’s internal threads by default. If a callback performs I/O, updates UI state, or does anything that may block, set start_callback_in_new_thread=True at construction time so that the recorder flow is not stalled:

recorder = AudioToTextRecorder(
    on_realtime_transcription_update=my_streaming_callback,
    start_callback_in_new_thread=True,
)

See Callbacks for the complete list of event callbacks and recommended threading patterns.

Core API

AudioToTextRecorder: Main RealtimeSTT Class Reference

Constructor

text()

Blocking usage

Callback / dictation loop

feed_audio()

shutdown()

Context Manager

Thread Safety

Build docs developers (and LLMs) love

Core API

Documentation Index

​Constructor

​text()

​Blocking usage

​Callback / dictation loop

​feed_audio()

​shutdown()

​Context Manager

​Thread Safety

Build docs developers (and LLMs) love

Constructor

text()

Blocking usage

Callback / dictation loop

feed_audio()

shutdown()

Context Manager

Thread Safety