Documentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
Every parameter that AudioToTextRecorder accepts at construction time is listed here, organized by purpose so you can quickly find and tune the settings that matter for your use case. All parameters are keyword arguments with sensible defaults — start with the minimal constructor and add only what you need.
from RealtimeSTT import AudioToTextRecorder
recorder = AudioToTextRecorder(
model="small.en",
language="en",
enable_realtime_transcription=True,
)
Model and Engine Parameters
These parameters control which transcription backend is loaded, which model weights are used, and hardware placement.
| Parameter | Default | Description |
|---|
model | "tiny" | Main transcription model name or model path. Interpretation depends on transcription_engine. |
transcription_engine | "faster_whisper" | Main transcription backend. See the transcription engines guide for valid values. |
transcription_engine_options | None | Engine-specific dictionary passed only to the main backend. |
download_root | None | Directory for model downloads or lookup. Behavior is engine-specific. |
language | "" | Language code. Empty string lets engines auto-detect when they support it. Some engines require a language. |
compute_type | "default" | Numeric precision/quantization hint. For faster-whisper, see CTranslate2 quantization. Other engines map this where possible. |
gpu_device_index | 0 | GPU id, or a list of GPU ids for compatible engines. |
device | "cuda" | Device hint, usually "cuda" or "cpu". CPU-only engines ignore GPU settings. |
batch_size | 16 | Main transcription batch size. Set 0 to disable batched faster-whisper inference. |
beam_size | 5 | Main transcription beam size where supported. |
initial_prompt | None | String or token iterable passed to the main engine as prompt/context where supported. |
suppress_tokens | [-1] | Token ids suppressed by Whisper-family engines where supported. |
faster_whisper_vad_filter | True | Enables faster-whisper’s own VAD filter during transcription in addition to recorder VAD. |
normalize_audio | False | Normalizes audio peak before transcription in engine adapters that use the shared normalization helper. |
These parameters control how audio is captured from the microphone or accepted via feed_audio().
| Parameter | Default | Description |
|---|
input_device_index | None | PyAudio input device index. None lets PyAudio choose the default device. |
use_microphone | True | When False, audio must be supplied through feed_audio(). |
buffer_size | 512 | Recorder audio buffer size. Changing this can affect VAD behavior. |
sample_rate | 16000 | Recorder sample rate. WebRTC VAD is sensitive to sample rate changes. |
handle_buffer_overflow | platform-dependent | Logs and drops overflowed microphone input. Defaults to True except on macOS. |
allowed_latency_limit | 100 | Maximum unprocessed input chunks before old chunks may be discarded. |
on_recorded_chunk | None | Callback receiving each recorded audio chunk. |
To enumerate available input devices and their supported sample rates, use AudioInput.list_devices() or call recorder_client.list_devices() from AudioToTextRecorderClient. Device indices are stable within a session but may change across reboots.
Text Formatting and Lifecycle
These parameters control transcript post-processing, console output, and logging behavior.
| Parameter | Default | Description |
|---|
ensure_sentence_starting_uppercase | True | Capitalizes detected sentence starts. |
ensure_sentence_ends_with_period | True | Adds a final period when final text does not end in punctuation. |
spinner | True | Shows the console state spinner. |
level | logging.WARNING | Logger level used by the recorder. |
debug_mode | False | Prints additional debug information. |
print_transcription_time | False | Logs main transcription processing time. |
no_log_file | False | Skips the debug log file. |
use_extended_logging | False | Enables more detailed recording worker logs. |
start_callback_in_new_thread | False | Runs callbacks in new threads instead of the recorder thread. |
Recording and VAD Parameters
These parameters govern how speech activity is detected, when recordings start and stop, and how audio is buffered around speech events.
| Parameter | Default | Description |
|---|
silero_sensitivity | 0.4 | Silero VAD sensitivity, from 0 to 1. |
silero_use_onnx | None | Legacy Silero backend switch. True forces the ONNX path, False forces the PyTorch path, None defers to silero_backend. |
silero_deactivity_detection | False | Uses Silero for end-of-speech detection instead of the default WebRTC end detection path. |
deactivity_silence_confirmation_duration | 0.16 | Required continuous VAD silence before end-of-speech silence is confirmed. |
webrtc_sensitivity | 3 | WebRTC VAD aggressiveness from 0 to 3; higher is more aggressive and less sensitive. |
warmup_vad | True | Runs a small VAD warmup during initialization to avoid first-chunk lazy setup cost. |
post_speech_silence_duration | 0.6 | Required silence after speech before a recording is considered complete. |
min_length_of_recording | 0.5 | Minimum recording duration in seconds. |
min_gap_between_recordings | 0 | Minimum gap in seconds between recordings. |
pre_recording_buffer_duration | 1.0 | Amount of pre-roll audio to keep before detected speech. |
pre_recording_buffer_trim_config | None | Optional dictionary of trim configuration for the pre-recording buffer. None disables trimming. |
early_transcription_on_silence | 0 | Starts an early final transcription after this many milliseconds of silence; the result is discarded if speech resumes. |
Tuning VAD sensitivity
The two VAD engines work in combination. WebRTC performs a fast binary speech/non-speech classification; Silero adds a confidence score on top. Raising webrtc_sensitivity (toward 3) makes the recorder less likely to trigger on background noise. Raising silero_sensitivity (toward 1) makes Silero more sensitive to quiet speech.
recorder = AudioToTextRecorder(
webrtc_sensitivity=2,
silero_sensitivity=0.5,
post_speech_silence_duration=0.4,
)
Realtime Transcription Parameters
Realtime transcription delivers interim text while the speaker is still talking. Enable it with enable_realtime_transcription=True and supply at least on_realtime_transcription_update.
| Parameter | Default | Description |
|---|
enable_realtime_transcription | False | Enables interim transcription while recording is still active. |
use_main_model_for_realtime | False | Reuses the main model for realtime updates instead of loading a separate realtime model. |
realtime_transcription_engine | None | Realtime backend. None uses transcription_engine. |
realtime_transcription_engine_options | None | Engine-specific options for realtime. None reuses transcription_engine_options. |
realtime_model_type | "tiny" | Realtime model name or path. |
realtime_processing_pause | 0.2 | Seconds between realtime transcription attempts. Lower values increase load. |
init_realtime_after_seconds | 0.2 | Initial delay after recording starts before the first realtime update. |
realtime_batch_size | 16 | Realtime transcription batch size. |
beam_size_realtime | 3 | Realtime beam size where supported. |
initial_prompt_realtime | None | Prompt/context for the realtime model where supported. |
realtime_transcription_use_syllable_boundaries | False | Schedules realtime updates from a lightweight acoustic boundary detector instead of only a fixed timer. |
realtime_boundary_detector_sensitivity | 0.6 | Boundary detector sensitivity, from conservative 0 to eager 1. |
realtime_boundary_followup_delays | (0.05, 0.2) | Extra realtime update delays after a detected boundary. None or empty disables follow-ups. |
recorder = AudioToTextRecorder(
enable_realtime_transcription=True,
realtime_model_type="tiny.en",
realtime_processing_pause=0.1,
on_realtime_transcription_update=lambda text: print("\r" + text, end=""),
)
Wake Word Parameters
Wake word mode keeps the recorder idle until the configured keyword is detected, then activates normal recording for the duration of wake_word_timeout.
| Parameter | Default | Description |
|---|
wakeword_backend | "" | Wake word backend. Use "pvporcupine" / "pvp" or "oww" / "openwakeword". |
wake_words | "" | Comma-separated Porcupine keywords. Also enables wake word mode. |
wake_words_sensitivity | 0.6 | Wake word sensitivity from 0 to 1. |
wake_word_activation_delay | 0.0 | Delay before switching from normal voice activation to wake word activation. |
wake_word_timeout | 5.0 | Seconds after wake word detection to wait for speech before returning to wake word mode. |
wake_word_buffer_duration | 0.1 | Audio removed/buffered around wake word detection so the wake word is not included in the transcription. |
openwakeword_model_paths | None | Comma-separated OpenWakeWord .onnx or .tflite model paths. |
openwakeword_inference_framework | "onnx" | OpenWakeWord inference framework: "onnx" or "tflite". |
# Porcupine example
recorder = AudioToTextRecorder(
wakeword_backend="pvporcupine",
wake_words="jarvis",
wake_words_sensitivity=0.5,
wake_word_timeout=5.0,
)
# OpenWakeWord example
recorder = AudioToTextRecorder(
wakeword_backend="openwakeword",
openwakeword_model_paths="/path/to/hey_mycroft.onnx",
)
Executor Injection
These parameters are intended for advanced server integration and testing. They replace the default transcription execution path with a custom callable, allowing a shared model to be reused across sessions without duplication.
| Parameter | Default | Description |
|---|
transcription_executor | None | Optional callable used instead of the default main transcription execution path. Primarily used by tests and server integration. |
realtime_transcription_executor | None | Optional callable used instead of the default realtime transcription execution path. Primarily used by tests and shared-model server integration. |
Silero Backend Parameters
These parameters control the low-level Silero VAD backend selection and ONNX runtime configuration.
| Parameter | Default | Description |
|---|
silero_backend | "auto" | Silero backend selector. "auto" picks the best available option. |
silero_onnx_model_path | None | Path to a custom Silero ONNX model file. None uses the bundled model. |
silero_onnx_threads | 2 | Number of ONNX inference threads for the Silero ONNX backend. |