Enable Real-time Interim Transcription in RealtimeSTT

By default, RealtimeSTT delivers a single final transcription after the speaker finishes talking. Real-time transcription mode adds a second stream of interim updates that arrive while the utterance is still in progress — useful for live captioning, streaming dictation UIs, and any application that should display text before the speaker pauses. Interim results are approximate; the final transcription is authoritative.

Basic Setup

Set enable_realtime_transcription=True and supply at least one of the two realtime callbacks:

from RealtimeSTT import AudioToTextRecorder


def on_update(text):
    print(f"\r{text}", end="", flush=True)


def on_stabilized(text):
    print(f"\r{text} [stabilized]", end="", flush=True)


if __name__ == "__main__":
    recorder = AudioToTextRecorder(
        enable_realtime_transcription=True,
        on_realtime_transcription_update=on_update,
        on_realtime_transcription_stabilized=on_stabilized,
    )

    # text() blocks until the utterance is complete and returns the final result
    final = recorder.text()
    print(f"\nFinal: {final}")
    recorder.shutdown()

Using a Separate Realtime Model

By default, RealtimeSTT loads a lightweight "tiny" Whisper model for interim updates so that the main transcription model stays free for final work. You can change the realtime model independently:

recorder = AudioToTextRecorder(
    model="small.en",                   # final transcription model
    realtime_model_type="base.en",      # interim model (default: "tiny")
    enable_realtime_transcription=True,
    realtime_processing_pause=0.2,      # seconds between interim attempts
)

realtime_processing_pause controls how often the realtime model runs. Lower values produce more frequent updates at the cost of higher CPU/GPU usage.

Set use_main_model_for_realtime=True to skip loading a second model entirely. The main model handles both final and interim transcription, reducing memory usage at the cost of slightly higher latency on final results — useful on memory-constrained hardware.

recorder = AudioToTextRecorder(
    model="small.en",
    use_main_model_for_realtime=True,
    enable_realtime_transcription=True,
)

Stabilized vs Raw Updates

Two callbacks expose different levels of interim text processing:

Callback	What you receive
`on_realtime_transcription_update`	Raw interim transcript — updated every `realtime_processing_pause` seconds. Can change significantly between calls as context builds.
`on_realtime_transcription_stabilized`	Smoothed output that changes more conservatively. Earlier portions of the text are “locked in” as confidence grows, so the display flickers less.

Use on_realtime_transcription_update when you want maximum immediacy. Use on_realtime_transcription_stabilized when display stability matters more than raw latency.

Syllable Boundary Scheduling

By default, interim transcription fires on a fixed timer (realtime_processing_pause). Enabling syllable boundary detection fires additional updates at acoustically natural pause points instead — reducing wasted inference runs when the speaker is in the middle of a word.

recorder = AudioToTextRecorder(
    enable_realtime_transcription=True,
    realtime_transcription_use_syllable_boundaries=True,
    realtime_boundary_detector_sensitivity=0.6,   # 0 = conservative, 1 = eager
    realtime_boundary_followup_delays=(0.05, 0.2), # extra checks after each boundary
)

realtime_boundary_detector_sensitivity controls how readily a pause is classified as a syllable boundary. Higher values trigger more frequent updates; lower values are more conservative and reduce false boundaries in fast speech.

Two-Engine Setup

You can use entirely different backends for final and realtime transcription. This is common when a high-accuracy model handles final results and a faster, lighter model handles interim updates:

from RealtimeSTT import AudioToTextRecorder


def on_update(text):
    print(f"\r[live] {text}", end="", flush=True)


if __name__ == "__main__":
    recorder = AudioToTextRecorder(
        # Final transcription: faster-whisper with a larger model
        transcription_engine="faster_whisper",
        model="small.en",
        # Realtime transcription: whisper.cpp with a tiny model for low latency
        realtime_transcription_engine="whisper_cpp",
        realtime_model_type="tiny.en",
        realtime_transcription_engine_options={
            "transcribe": {
                "single_segment": True,
                "no_context": True,
                "print_timestamps": False,
            }
        },
        enable_realtime_transcription=True,
        realtime_processing_pause=0.2,
        on_realtime_transcription_update=on_update,
    )

    final = recorder.text()
    print(f"\nFinal: {final}")
    recorder.shutdown()

Key Parameters

Parameter	Default	Description
`enable_realtime_transcription`	`False`	Enables interim transcription while recording is active.
`realtime_model_type`	`"tiny"`	Model name or path used for interim transcription.
`realtime_processing_pause`	`0.2`	Seconds between realtime inference attempts. When `realtime_transcription_use_syllable_boundaries` is `True`, this becomes a fallback cadence.
`init_realtime_after_seconds`	`0.2`	Delay after recording starts before the first interim update fires.
`realtime_batch_size`	`16`	Batch size for realtime inference.
`beam_size_realtime`	`3`	Beam size for realtime inference where supported. Lower values are faster.
`realtime_transcription_use_syllable_boundaries`	`False`	Fires realtime updates at detected acoustic boundaries instead of (only) on a fixed timer.
`realtime_boundary_detector_sensitivity`	`0.6`	Boundary detector sensitivity from `0` (conservative) to `1` (eager).
`use_main_model_for_realtime`	`False`	Reuse the main model for realtime work rather than loading a second model.
`realtime_transcription_engine`	`None`	Backend for realtime transcription. `None` inherits `transcription_engine`.
`realtime_transcription_engine_options`	`None`	Engine-specific options for the realtime backend.

Get Started

Guides

Transcription Engines

Resources

Enable Real-time Interim Transcription in RealtimeSTT

Basic Setup

Using a Separate Realtime Model

Stabilized vs Raw Updates

Syllable Boundary Scheduling

Two-Engine Setup

Key Parameters

Build docs developers (and LLMs) love

Get Started

Guides

Transcription Engines

Resources

Documentation Index

​Basic Setup

​Using a Separate Realtime Model

​Stabilized vs Raw Updates

​Syllable Boundary Scheduling

​Two-Engine Setup

​Key Parameters

Build docs developers (and LLMs) love

Basic Setup

Using a Separate Realtime Model

Stabilized vs Raw Updates

Syllable Boundary Scheduling

Two-Engine Setup

Key Parameters