Documentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
AudioToTextRecorder exposes a rich set of optional event callbacks that let your application respond to every meaningful state transition in the recording and transcription pipeline. Every callback is passed as a keyword argument to the constructor — none are required. By default callbacks are invoked directly in the recorder’s internal processing thread. If any callback may block (due to I/O, network calls, or heavy computation) set start_callback_in_new_thread=True so the callback runs in its own thread and the recorder flow is not stalled.
recorder = AudioToTextRecorder(
on_recording_start=my_callback,
start_callback_in_new_thread=True, # recommended when callbacks may block
)
Set start_callback_in_new_thread=True whenever your callbacks perform file I/O, network requests, GUI updates on a non-main thread, or any other operation that could take non-trivial time. Without it a slow callback delays VAD processing and can cause audio buffering problems.
Recording Lifecycle Callbacks
These callbacks fire at the boundaries of individual recording segments — one utterance from speech onset to final transcript delivery.
| Callback | Signature | Called when |
|---|
on_recording_start | () -> None | A recording segment begins (speech onset confirmed). |
on_recording_stop | () -> None | A recording segment ends (post-speech silence reached). |
on_transcription_start | () -> None | Final transcription of the buffered audio begins. |
from RealtimeSTT import AudioToTextRecorder
def on_start():
print("[recording started]")
def on_stop():
print("[recording stopped — transcribing...]")
def on_transcription():
print("[transcription in progress]")
if __name__ == "__main__":
with AudioToTextRecorder(
on_recording_start=on_start,
on_recording_stop=on_stop,
on_transcription_start=on_transcription,
) as recorder:
text = recorder.text()
print("Result:", text)
Voice Activity Detection Callbacks
VAD callbacks give fine-grained visibility into the voice detection state machine. They are useful for driving UI indicators, logging, and debugging detection quality.
| Callback | Signature | Called when |
|---|
on_vad_start | () -> None | Voice activity is detected in the audio stream. |
on_vad_stop | () -> None | Voice activity ends in the audio stream. |
on_vad_detect_start | () -> None | The recorder begins actively listening for voice activity. |
on_vad_detect_stop | () -> None | The recorder stops actively listening for voice activity. |
on_turn_detection_start | () -> None | Turn detection starts (the recorder is determining whether a new turn has begun). |
on_turn_detection_stop | () -> None | Turn detection stops. |
from RealtimeSTT import AudioToTextRecorder
indicator = {"active": False}
def vad_on():
indicator["active"] = True
print("🎙 speaking")
def vad_off():
indicator["active"] = False
print("🔇 silence")
if __name__ == "__main__":
with AudioToTextRecorder(
on_vad_start=vad_on,
on_vad_stop=vad_off,
) as recorder:
while True:
recorder.text(on_transcription_finished=print)
Realtime Transcription Callbacks
Realtime callbacks deliver interim text while the speaker is still talking. They require enable_realtime_transcription=True.
| Callback | Signature | Called when |
|---|
on_realtime_transcription_update | (text: str) -> None | New raw interim text is available. Fires frequently; text may be unstable. |
on_realtime_transcription_stabilized | (text: str) -> None | A higher-quality, smoothed version of the interim text is available. |
on_realtime_text_stabilization_update | (data) -> None | A structured realtime stabilization event is available. Receives a data object with stabilization details (advanced use). |
on_realtime_transcription_update fires on every new interim result and is the right hook for displaying a live “typing” transcript. on_realtime_transcription_stabilized fires less often and produces smoother output by buffering and re-evaluating recent results — prefer it when you want low-flicker live captions.
from RealtimeSTT import AudioToTextRecorder
import sys
def on_update(text):
# Overwrite the current line in the terminal
print(f"\r{text} ", end="", flush=True)
def on_stabilized(text):
# Higher-quality interim — update a display widget, for example
pass
def on_final(text):
print(f"\n✓ {text}")
if __name__ == "__main__":
with AudioToTextRecorder(
enable_realtime_transcription=True,
realtime_model_type="tiny.en",
on_realtime_transcription_update=on_update,
on_realtime_transcription_stabilized=on_stabilized,
start_callback_in_new_thread=True,
) as recorder:
while True:
recorder.text(on_transcription_finished=on_final)
Wake Word Callbacks
Wake word callbacks let you respond to keyword detection events and implement custom UI states such as a listening indicator or a timeout warning.
| Callback | Signature | Called when |
|---|
on_wakeword_detected | () -> None | A wake word is detected and recording is about to begin. |
on_wakeword_timeout | () -> None | A wake word was detected but no speech arrived before wake_word_timeout expired. |
on_wakeword_detection_start | () -> None | The recorder starts listening for the wake word. |
on_wakeword_detection_end | () -> None | The recorder stops listening for the wake word (e.g., after detection or shutdown). |
from RealtimeSTT import AudioToTextRecorder
def on_wake():
print("Wake word detected — listening for command...")
def on_timeout():
print("No speech heard after wake word. Going back to sleep.")
if __name__ == "__main__":
with AudioToTextRecorder(
wakeword_backend="pvporcupine",
wake_words="jarvis",
wake_word_timeout=5.0,
on_wakeword_detected=on_wake,
on_wakeword_timeout=on_timeout,
) as recorder:
while True:
text = recorder.text()
if text:
print("Command:", text)
Audio Chunk Callback
| Callback | Signature | Called when |
|---|
on_recorded_chunk | (chunk: bytes) -> None | Each raw recorded PCM audio chunk is available. |
on_recorded_chunk fires for every audio chunk that passes through the recorder’s input path, regardless of VAD state. The chunk argument contains raw 16-bit mono PCM bytes at the recorder’s configured sample_rate. Use this callback to mirror the audio to a file, forward chunks to another system, or feed a secondary pipeline.
from RealtimeSTT import AudioToTextRecorder
import wave, os
output_path = "recorded.wav"
wav_file = None
def on_chunk(chunk: bytes):
global wav_file
if wav_file is None:
wav_file = wave.open(output_path, "wb")
wav_file.setnchannels(1)
wav_file.setsampwidth(2)
wav_file.setframerate(16000)
wav_file.writeframes(chunk)
if __name__ == "__main__":
with AudioToTextRecorder(on_recorded_chunk=on_chunk) as recorder:
text = recorder.text()
print("Transcribed:", text)
if wav_file:
wav_file.close()
Full Multi-Callback Example
The example below wires up callbacks from every group to show how they compose in a real application.
from RealtimeSTT import AudioToTextRecorder
def on_recording_start(): print("[REC ] started")
def on_recording_stop(): print("[REC ] stopped")
def on_transcription_start(): print("[TRANS] starting...")
def on_vad_start(): print("[VAD ] voice detected")
def on_vad_stop(): print("[VAD ] voice ended")
def on_rt_update(text): print(f"\r[RT ] {text} ", end="", flush=True)
def on_rt_stable(text): pass # used for display widget in real apps
def on_final(text):
print(f"\n[FINAL] {text}")
if __name__ == "__main__":
with AudioToTextRecorder(
model="small.en",
enable_realtime_transcription=True,
on_recording_start=on_recording_start,
on_recording_stop=on_recording_stop,
on_transcription_start=on_transcription_start,
on_vad_start=on_vad_start,
on_vad_stop=on_vad_stop,
on_realtime_transcription_update=on_rt_update,
on_realtime_transcription_stabilized=on_rt_stable,
start_callback_in_new_thread=True,
) as recorder:
while True:
recorder.text(on_transcription_finished=on_final)