RealtimeSTT’s default mode reads audio from the local microphone. When audio comes from somewhere else — a file, a WebSocket connection, a browser stream, a telephony server, another process, or a test fixture — setDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
use_microphone=False
and push audio into the recorder yourself by calling feed_audio(). The
recorder queues each chunk, runs VAD, and produces transcriptions exactly as it
would with a live microphone.
Basic Setup
Construct the recorder withuse_microphone=False, then call feed_audio() with
raw PCM bytes. Call recorder.text() to block until the next final utterance is
ready, and call recorder.shutdown() when the stream is finished.
Audio Format Requirements
feed_audio() expects raw PCM audio in the following format:
| Property | Required value |
|---|---|
| Encoding | 16-bit signed PCM |
| Channels | Mono (1 channel) |
| Sample rate | 16 000 Hz |
| Byte order | Little-endian |
original_sample_rate argument and RealtimeSTT will resample the chunk before
processing it. For example, browser microphone audio is commonly 48 kHz:
The
original_sample_rate parameter only handles sample-rate conversion.
Your audio must still be mono and 16-bit PCM before calling feed_audio().
Stereo or floating-point audio will produce garbled transcriptions.Feeding Audio from a File
The following example reads a binary PCM file in 100 ms chunks and feeds each one to the recorder:text() never returns, try appending a second or two of silence
to the stream, or lower post_speech_silence_duration in the constructor.
Streaming Audio
For live sources such as WebSocket connections or inter-process pipes, feed chunks as they arrive and calltext() on whatever application thread should
wait for each final utterance:
read_next_chunk_somehow() with your pipe, socket, queue, or media
framework integration. feed_audio() does not reorder chunks, so preserve
arrival order when reading from parallel sources.
Realtime Updates with External Audio
Enableenable_realtime_transcription to receive live interim text while audio
is still arriving:
Shutdown
When you use a context manager,__exit__ calls shutdown() automatically:
with block — as is typical when
feeding audio in a loop — call shutdown() explicitly when the stream ends:
shutdown() will leave background threads running. Use one recorder
per independent stream or session unless you are building a shared-engine
server that injects its own executor callables.