Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt

Use this file to discover all available pages before exploring further.

Every parameter that AudioToTextRecorder accepts at construction time is listed here, organized by purpose so you can quickly find and tune the settings that matter for your use case. All parameters are keyword arguments with sensible defaults — start with the minimal constructor and add only what you need.
from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    model="small.en",
    language="en",
    enable_realtime_transcription=True,
)

Model and Engine Parameters

These parameters control which transcription backend is loaded, which model weights are used, and hardware placement.
ParameterDefaultDescription
model"tiny"Main transcription model name or model path. Interpretation depends on transcription_engine.
transcription_engine"faster_whisper"Main transcription backend. See the transcription engines guide for valid values.
transcription_engine_optionsNoneEngine-specific dictionary passed only to the main backend.
download_rootNoneDirectory for model downloads or lookup. Behavior is engine-specific.
language""Language code. Empty string lets engines auto-detect when they support it. Some engines require a language.
compute_type"default"Numeric precision/quantization hint. For faster-whisper, see CTranslate2 quantization. Other engines map this where possible.
gpu_device_index0GPU id, or a list of GPU ids for compatible engines.
device"cuda"Device hint, usually "cuda" or "cpu". CPU-only engines ignore GPU settings.
batch_size16Main transcription batch size. Set 0 to disable batched faster-whisper inference.
beam_size5Main transcription beam size where supported.
initial_promptNoneString or token iterable passed to the main engine as prompt/context where supported.
suppress_tokens[-1]Token ids suppressed by Whisper-family engines where supported.
faster_whisper_vad_filterTrueEnables faster-whisper’s own VAD filter during transcription in addition to recorder VAD.
normalize_audioFalseNormalizes audio peak before transcription in engine adapters that use the shared normalization helper.

Audio Input Parameters

These parameters control how audio is captured from the microphone or accepted via feed_audio().
ParameterDefaultDescription
input_device_indexNonePyAudio input device index. None lets PyAudio choose the default device.
use_microphoneTrueWhen False, audio must be supplied through feed_audio().
buffer_size512Recorder audio buffer size. Changing this can affect VAD behavior.
sample_rate16000Recorder sample rate. WebRTC VAD is sensitive to sample rate changes.
handle_buffer_overflowplatform-dependentLogs and drops overflowed microphone input. Defaults to True except on macOS.
allowed_latency_limit100Maximum unprocessed input chunks before old chunks may be discarded.
on_recorded_chunkNoneCallback receiving each recorded audio chunk.
To enumerate available input devices and their supported sample rates, use AudioInput.list_devices() or call recorder_client.list_devices() from AudioToTextRecorderClient. Device indices are stable within a session but may change across reboots.

Text Formatting and Lifecycle

These parameters control transcript post-processing, console output, and logging behavior.
ParameterDefaultDescription
ensure_sentence_starting_uppercaseTrueCapitalizes detected sentence starts.
ensure_sentence_ends_with_periodTrueAdds a final period when final text does not end in punctuation.
spinnerTrueShows the console state spinner.
levellogging.WARNINGLogger level used by the recorder.
debug_modeFalsePrints additional debug information.
print_transcription_timeFalseLogs main transcription processing time.
no_log_fileFalseSkips the debug log file.
use_extended_loggingFalseEnables more detailed recording worker logs.
start_callback_in_new_threadFalseRuns callbacks in new threads instead of the recorder thread.

Recording and VAD Parameters

These parameters govern how speech activity is detected, when recordings start and stop, and how audio is buffered around speech events.
ParameterDefaultDescription
silero_sensitivity0.4Silero VAD sensitivity, from 0 to 1.
silero_use_onnxNoneLegacy Silero backend switch. True forces the ONNX path, False forces the PyTorch path, None defers to silero_backend.
silero_deactivity_detectionFalseUses Silero for end-of-speech detection instead of the default WebRTC end detection path.
deactivity_silence_confirmation_duration0.16Required continuous VAD silence before end-of-speech silence is confirmed.
webrtc_sensitivity3WebRTC VAD aggressiveness from 0 to 3; higher is more aggressive and less sensitive.
warmup_vadTrueRuns a small VAD warmup during initialization to avoid first-chunk lazy setup cost.
post_speech_silence_duration0.6Required silence after speech before a recording is considered complete.
min_length_of_recording0.5Minimum recording duration in seconds.
min_gap_between_recordings0Minimum gap in seconds between recordings.
pre_recording_buffer_duration1.0Amount of pre-roll audio to keep before detected speech.
pre_recording_buffer_trim_configNoneOptional dictionary of trim configuration for the pre-recording buffer. None disables trimming.
early_transcription_on_silence0Starts an early final transcription after this many milliseconds of silence; the result is discarded if speech resumes.

Tuning VAD sensitivity

The two VAD engines work in combination. WebRTC performs a fast binary speech/non-speech classification; Silero adds a confidence score on top. Raising webrtc_sensitivity (toward 3) makes the recorder less likely to trigger on background noise. Raising silero_sensitivity (toward 1) makes Silero more sensitive to quiet speech.
recorder = AudioToTextRecorder(
    webrtc_sensitivity=2,
    silero_sensitivity=0.5,
    post_speech_silence_duration=0.4,
)

Realtime Transcription Parameters

Realtime transcription delivers interim text while the speaker is still talking. Enable it with enable_realtime_transcription=True and supply at least on_realtime_transcription_update.
ParameterDefaultDescription
enable_realtime_transcriptionFalseEnables interim transcription while recording is still active.
use_main_model_for_realtimeFalseReuses the main model for realtime updates instead of loading a separate realtime model.
realtime_transcription_engineNoneRealtime backend. None uses transcription_engine.
realtime_transcription_engine_optionsNoneEngine-specific options for realtime. None reuses transcription_engine_options.
realtime_model_type"tiny"Realtime model name or path.
realtime_processing_pause0.2Seconds between realtime transcription attempts. Lower values increase load.
init_realtime_after_seconds0.2Initial delay after recording starts before the first realtime update.
realtime_batch_size16Realtime transcription batch size.
beam_size_realtime3Realtime beam size where supported.
initial_prompt_realtimeNonePrompt/context for the realtime model where supported.
realtime_transcription_use_syllable_boundariesFalseSchedules realtime updates from a lightweight acoustic boundary detector instead of only a fixed timer.
realtime_boundary_detector_sensitivity0.6Boundary detector sensitivity, from conservative 0 to eager 1.
realtime_boundary_followup_delays(0.05, 0.2)Extra realtime update delays after a detected boundary. None or empty disables follow-ups.
recorder = AudioToTextRecorder(
    enable_realtime_transcription=True,
    realtime_model_type="tiny.en",
    realtime_processing_pause=0.1,
    on_realtime_transcription_update=lambda text: print("\r" + text, end=""),
)

Wake Word Parameters

Wake word mode keeps the recorder idle until the configured keyword is detected, then activates normal recording for the duration of wake_word_timeout.
ParameterDefaultDescription
wakeword_backend""Wake word backend. Use "pvporcupine" / "pvp" or "oww" / "openwakeword".
wake_words""Comma-separated Porcupine keywords. Also enables wake word mode.
wake_words_sensitivity0.6Wake word sensitivity from 0 to 1.
wake_word_activation_delay0.0Delay before switching from normal voice activation to wake word activation.
wake_word_timeout5.0Seconds after wake word detection to wait for speech before returning to wake word mode.
wake_word_buffer_duration0.1Audio removed/buffered around wake word detection so the wake word is not included in the transcription.
openwakeword_model_pathsNoneComma-separated OpenWakeWord .onnx or .tflite model paths.
openwakeword_inference_framework"onnx"OpenWakeWord inference framework: "onnx" or "tflite".
# Porcupine example
recorder = AudioToTextRecorder(
    wakeword_backend="pvporcupine",
    wake_words="jarvis",
    wake_words_sensitivity=0.5,
    wake_word_timeout=5.0,
)

# OpenWakeWord example
recorder = AudioToTextRecorder(
    wakeword_backend="openwakeword",
    openwakeword_model_paths="/path/to/hey_mycroft.onnx",
)

Executor Injection

These parameters are intended for advanced server integration and testing. They replace the default transcription execution path with a custom callable, allowing a shared model to be reused across sessions without duplication.
ParameterDefaultDescription
transcription_executorNoneOptional callable used instead of the default main transcription execution path. Primarily used by tests and server integration.
realtime_transcription_executorNoneOptional callable used instead of the default realtime transcription execution path. Primarily used by tests and shared-model server integration.

Silero Backend Parameters

These parameters control the low-level Silero VAD backend selection and ONNX runtime configuration.
ParameterDefaultDescription
silero_backend"auto"Silero backend selector. "auto" picks the best available option.
silero_onnx_model_pathNonePath to a custom Silero ONNX model file. None uses the bundled model.
silero_onnx_threads2Number of ONNX inference threads for the Silero ONNX backend.

Build docs developers (and LLMs) love