Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt

Use this file to discover all available pages before exploring further.

By default, RealtimeSTT delivers a single final transcription after the speaker finishes talking. Real-time transcription mode adds a second stream of interim updates that arrive while the utterance is still in progress — useful for live captioning, streaming dictation UIs, and any application that should display text before the speaker pauses. Interim results are approximate; the final transcription is authoritative.

Basic Setup

Set enable_realtime_transcription=True and supply at least one of the two realtime callbacks:
from RealtimeSTT import AudioToTextRecorder


def on_update(text):
    print(f"\r{text}", end="", flush=True)


def on_stabilized(text):
    print(f"\r{text} [stabilized]", end="", flush=True)


if __name__ == "__main__":
    recorder = AudioToTextRecorder(
        enable_realtime_transcription=True,
        on_realtime_transcription_update=on_update,
        on_realtime_transcription_stabilized=on_stabilized,
    )

    # text() blocks until the utterance is complete and returns the final result
    final = recorder.text()
    print(f"\nFinal: {final}")
    recorder.shutdown()

Using a Separate Realtime Model

By default, RealtimeSTT loads a lightweight "tiny" Whisper model for interim updates so that the main transcription model stays free for final work. You can change the realtime model independently:
recorder = AudioToTextRecorder(
    model="small.en",                   # final transcription model
    realtime_model_type="base.en",      # interim model (default: "tiny")
    enable_realtime_transcription=True,
    realtime_processing_pause=0.2,      # seconds between interim attempts
)
realtime_processing_pause controls how often the realtime model runs. Lower values produce more frequent updates at the cost of higher CPU/GPU usage.
Set use_main_model_for_realtime=True to skip loading a second model entirely. The main model handles both final and interim transcription, reducing memory usage at the cost of slightly higher latency on final results — useful on memory-constrained hardware.
recorder = AudioToTextRecorder(
    model="small.en",
    use_main_model_for_realtime=True,
    enable_realtime_transcription=True,
)

Stabilized vs Raw Updates

Two callbacks expose different levels of interim text processing:
CallbackWhat you receive
on_realtime_transcription_updateRaw interim transcript — updated every realtime_processing_pause seconds. Can change significantly between calls as context builds.
on_realtime_transcription_stabilizedSmoothed output that changes more conservatively. Earlier portions of the text are “locked in” as confidence grows, so the display flickers less.
Use on_realtime_transcription_update when you want maximum immediacy. Use on_realtime_transcription_stabilized when display stability matters more than raw latency.

Syllable Boundary Scheduling

By default, interim transcription fires on a fixed timer (realtime_processing_pause). Enabling syllable boundary detection fires additional updates at acoustically natural pause points instead — reducing wasted inference runs when the speaker is in the middle of a word.
recorder = AudioToTextRecorder(
    enable_realtime_transcription=True,
    realtime_transcription_use_syllable_boundaries=True,
    realtime_boundary_detector_sensitivity=0.6,   # 0 = conservative, 1 = eager
    realtime_boundary_followup_delays=(0.05, 0.2), # extra checks after each boundary
)
realtime_boundary_detector_sensitivity controls how readily a pause is classified as a syllable boundary. Higher values trigger more frequent updates; lower values are more conservative and reduce false boundaries in fast speech.

Two-Engine Setup

You can use entirely different backends for final and realtime transcription. This is common when a high-accuracy model handles final results and a faster, lighter model handles interim updates:
from RealtimeSTT import AudioToTextRecorder


def on_update(text):
    print(f"\r[live] {text}", end="", flush=True)


if __name__ == "__main__":
    recorder = AudioToTextRecorder(
        # Final transcription: faster-whisper with a larger model
        transcription_engine="faster_whisper",
        model="small.en",
        # Realtime transcription: whisper.cpp with a tiny model for low latency
        realtime_transcription_engine="whisper_cpp",
        realtime_model_type="tiny.en",
        realtime_transcription_engine_options={
            "transcribe": {
                "single_segment": True,
                "no_context": True,
                "print_timestamps": False,
            }
        },
        enable_realtime_transcription=True,
        realtime_processing_pause=0.2,
        on_realtime_transcription_update=on_update,
    )

    final = recorder.text()
    print(f"\nFinal: {final}")
    recorder.shutdown()

Key Parameters

ParameterDefaultDescription
enable_realtime_transcriptionFalseEnables interim transcription while recording is active.
realtime_model_type"tiny"Model name or path used for interim transcription.
realtime_processing_pause0.2Seconds between realtime inference attempts. When realtime_transcription_use_syllable_boundaries is True, this becomes a fallback cadence.
init_realtime_after_seconds0.2Delay after recording starts before the first interim update fires.
realtime_batch_size16Batch size for realtime inference.
beam_size_realtime3Beam size for realtime inference where supported. Lower values are faster.
realtime_transcription_use_syllable_boundariesFalseFires realtime updates at detected acoustic boundaries instead of (only) on a fixed timer.
realtime_boundary_detector_sensitivity0.6Boundary detector sensitivity from 0 (conservative) to 1 (eager).
use_main_model_for_realtimeFalseReuse the main model for realtime work rather than loading a second model.
realtime_transcription_engineNoneBackend for realtime transcription. None inherits transcription_engine.
realtime_transcription_engine_optionsNoneEngine-specific options for the realtime backend.

Build docs developers (and LLMs) love