AudioToTextRecorder Configuration Parameter Reference

Every parameter that AudioToTextRecorder accepts at construction time is listed here, organized by purpose so you can quickly find and tune the settings that matter for your use case. All parameters are keyword arguments with sensible defaults — start with the minimal constructor and add only what you need.

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    model="small.en",
    language="en",
    enable_realtime_transcription=True,
)

Model and Engine Parameters

These parameters control which transcription backend is loaded, which model weights are used, and hardware placement.

Parameter	Default	Description
`model`	`"tiny"`	Main transcription model name or model path. Interpretation depends on `transcription_engine`.
`transcription_engine`	`"faster_whisper"`	Main transcription backend. See the transcription engines guide for valid values.
`transcription_engine_options`	`None`	Engine-specific dictionary passed only to the main backend.
`download_root`	`None`	Directory for model downloads or lookup. Behavior is engine-specific.
`language`	`""`	Language code. Empty string lets engines auto-detect when they support it. Some engines require a language.
`compute_type`	`"default"`	Numeric precision/quantization hint. For faster-whisper, see CTranslate2 quantization. Other engines map this where possible.
`gpu_device_index`	`0`	GPU id, or a list of GPU ids for compatible engines.
`device`	`"cuda"`	Device hint, usually `"cuda"` or `"cpu"`. CPU-only engines ignore GPU settings.
`batch_size`	`16`	Main transcription batch size. Set `0` to disable batched faster-whisper inference.
`beam_size`	`5`	Main transcription beam size where supported.
`initial_prompt`	`None`	String or token iterable passed to the main engine as prompt/context where supported.
`suppress_tokens`	`[-1]`	Token ids suppressed by Whisper-family engines where supported.
`faster_whisper_vad_filter`	`True`	Enables faster-whisper’s own VAD filter during transcription in addition to recorder VAD.
`normalize_audio`	`False`	Normalizes audio peak before transcription in engine adapters that use the shared normalization helper.

Audio Input Parameters

These parameters control how audio is captured from the microphone or accepted via feed_audio().

Parameter	Default	Description
`input_device_index`	`None`	PyAudio input device index. `None` lets PyAudio choose the default device.
`use_microphone`	`True`	When `False`, audio must be supplied through `feed_audio()`.
`buffer_size`	`512`	Recorder audio buffer size. Changing this can affect VAD behavior.
`sample_rate`	`16000`	Recorder sample rate. WebRTC VAD is sensitive to sample rate changes.
`handle_buffer_overflow`	platform-dependent	Logs and drops overflowed microphone input. Defaults to `True` except on macOS.
`allowed_latency_limit`	`100`	Maximum unprocessed input chunks before old chunks may be discarded.
`on_recorded_chunk`	`None`	Callback receiving each recorded audio chunk.

To enumerate available input devices and their supported sample rates, use AudioInput.list_devices() or call recorder_client.list_devices() from AudioToTextRecorderClient. Device indices are stable within a session but may change across reboots.

Text Formatting and Lifecycle

These parameters control transcript post-processing, console output, and logging behavior.

Parameter	Default	Description
`ensure_sentence_starting_uppercase`	`True`	Capitalizes detected sentence starts.
`ensure_sentence_ends_with_period`	`True`	Adds a final period when final text does not end in punctuation.
`spinner`	`True`	Shows the console state spinner.
`level`	`logging.WARNING`	Logger level used by the recorder.
`debug_mode`	`False`	Prints additional debug information.
`print_transcription_time`	`False`	Logs main transcription processing time.
`no_log_file`	`False`	Skips the debug log file.
`use_extended_logging`	`False`	Enables more detailed recording worker logs.
`start_callback_in_new_thread`	`False`	Runs callbacks in new threads instead of the recorder thread.

Recording and VAD Parameters

These parameters govern how speech activity is detected, when recordings start and stop, and how audio is buffered around speech events.

Parameter	Default	Description
`silero_sensitivity`	`0.4`	Silero VAD sensitivity, from `0` to `1`.
`silero_use_onnx`	`None`	Legacy Silero backend switch. `True` forces the ONNX path, `False` forces the PyTorch path, `None` defers to `silero_backend`.
`silero_deactivity_detection`	`False`	Uses Silero for end-of-speech detection instead of the default WebRTC end detection path.
`deactivity_silence_confirmation_duration`	`0.16`	Required continuous VAD silence before end-of-speech silence is confirmed.
`webrtc_sensitivity`	`3`	WebRTC VAD aggressiveness from `0` to `3`; higher is more aggressive and less sensitive.
`warmup_vad`	`True`	Runs a small VAD warmup during initialization to avoid first-chunk lazy setup cost.
`post_speech_silence_duration`	`0.6`	Required silence after speech before a recording is considered complete.
`min_length_of_recording`	`0.5`	Minimum recording duration in seconds.
`min_gap_between_recordings`	`0`	Minimum gap in seconds between recordings.
`pre_recording_buffer_duration`	`1.0`	Amount of pre-roll audio to keep before detected speech.
`pre_recording_buffer_trim_config`	`None`	Optional dictionary of trim configuration for the pre-recording buffer. `None` disables trimming.
`early_transcription_on_silence`	`0`	Starts an early final transcription after this many milliseconds of silence; the result is discarded if speech resumes.

Tuning VAD sensitivity

The two VAD engines work in combination. WebRTC performs a fast binary speech/non-speech classification; Silero adds a confidence score on top. Raising webrtc_sensitivity (toward 3) makes the recorder less likely to trigger on background noise. Raising silero_sensitivity (toward 1) makes Silero more sensitive to quiet speech.

recorder = AudioToTextRecorder(
    webrtc_sensitivity=2,
    silero_sensitivity=0.5,
    post_speech_silence_duration=0.4,
)

Realtime Transcription Parameters

Realtime transcription delivers interim text while the speaker is still talking. Enable it with enable_realtime_transcription=True and supply at least on_realtime_transcription_update.

Parameter	Default	Description
`enable_realtime_transcription`	`False`	Enables interim transcription while recording is still active.
`use_main_model_for_realtime`	`False`	Reuses the main model for realtime updates instead of loading a separate realtime model.
`realtime_transcription_engine`	`None`	Realtime backend. `None` uses `transcription_engine`.
`realtime_transcription_engine_options`	`None`	Engine-specific options for realtime. `None` reuses `transcription_engine_options`.
`realtime_model_type`	`"tiny"`	Realtime model name or path.
`realtime_processing_pause`	`0.2`	Seconds between realtime transcription attempts. Lower values increase load.
`init_realtime_after_seconds`	`0.2`	Initial delay after recording starts before the first realtime update.
`realtime_batch_size`	`16`	Realtime transcription batch size.
`beam_size_realtime`	`3`	Realtime beam size where supported.
`initial_prompt_realtime`	`None`	Prompt/context for the realtime model where supported.
`realtime_transcription_use_syllable_boundaries`	`False`	Schedules realtime updates from a lightweight acoustic boundary detector instead of only a fixed timer.
`realtime_boundary_detector_sensitivity`	`0.6`	Boundary detector sensitivity, from conservative `0` to eager `1`.
`realtime_boundary_followup_delays`	`(0.05, 0.2)`	Extra realtime update delays after a detected boundary. `None` or empty disables follow-ups.

recorder = AudioToTextRecorder(
    enable_realtime_transcription=True,
    realtime_model_type="tiny.en",
    realtime_processing_pause=0.1,
    on_realtime_transcription_update=lambda text: print("\r" + text, end=""),
)

Wake Word Parameters

Wake word mode keeps the recorder idle until the configured keyword is detected, then activates normal recording for the duration of wake_word_timeout.

Parameter	Default	Description
`wakeword_backend`	`""`	Wake word backend. Use `"pvporcupine"` / `"pvp"` or `"oww"` / `"openwakeword"`.
`wake_words`	`""`	Comma-separated Porcupine keywords. Also enables wake word mode.
`wake_words_sensitivity`	`0.6`	Wake word sensitivity from `0` to `1`.
`wake_word_activation_delay`	`0.0`	Delay before switching from normal voice activation to wake word activation.
`wake_word_timeout`	`5.0`	Seconds after wake word detection to wait for speech before returning to wake word mode.
`wake_word_buffer_duration`	`0.1`	Audio removed/buffered around wake word detection so the wake word is not included in the transcription.
`openwakeword_model_paths`	`None`	Comma-separated OpenWakeWord `.onnx` or `.tflite` model paths.
`openwakeword_inference_framework`	`"onnx"`	OpenWakeWord inference framework: `"onnx"` or `"tflite"`.

# Porcupine example
recorder = AudioToTextRecorder(
    wakeword_backend="pvporcupine",
    wake_words="jarvis",
    wake_words_sensitivity=0.5,
    wake_word_timeout=5.0,
)

# OpenWakeWord example
recorder = AudioToTextRecorder(
    wakeword_backend="openwakeword",
    openwakeword_model_paths="/path/to/hey_mycroft.onnx",
)

Executor Injection

These parameters are intended for advanced server integration and testing. They replace the default transcription execution path with a custom callable, allowing a shared model to be reused across sessions without duplication.

Parameter	Default	Description
`transcription_executor`	`None`	Optional callable used instead of the default main transcription execution path. Primarily used by tests and server integration.
`realtime_transcription_executor`	`None`	Optional callable used instead of the default realtime transcription execution path. Primarily used by tests and shared-model server integration.

Silero Backend Parameters

These parameters control the low-level Silero VAD backend selection and ONNX runtime configuration.

Parameter	Default	Description
`silero_backend`	`"auto"`	Silero backend selector. `"auto"` picks the best available option.
`silero_onnx_model_path`	`None`	Path to a custom Silero ONNX model file. `None` uses the bundled model.
`silero_onnx_threads`	`2`	Number of ONNX inference threads for the Silero ONNX backend.

Core API

AudioToTextRecorder Configuration Parameter Reference

Model and Engine Parameters

Audio Input Parameters

Text Formatting and Lifecycle

Recording and VAD Parameters

Tuning VAD sensitivity

Realtime Transcription Parameters

Wake Word Parameters

Executor Injection

Silero Backend Parameters

Build docs developers (and LLMs) love

Core API

Documentation Index

​Model and Engine Parameters

​Audio Input Parameters

​Text Formatting and Lifecycle

​Recording and VAD Parameters

​Tuning VAD sensitivity

​Realtime Transcription Parameters

​Wake Word Parameters

​Executor Injection

​Silero Backend Parameters

Build docs developers (and LLMs) love

Model and Engine Parameters

Audio Input Parameters

Text Formatting and Lifecycle

Recording and VAD Parameters

Tuning VAD sensitivity

Realtime Transcription Parameters

Wake Word Parameters

Executor Injection

Silero Backend Parameters