Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt

Use this file to discover all available pages before exploring further.

faster_whisper is the default RealtimeSTT transcription engine. It wraps the faster-whisper package, which runs Whisper models through the CTranslate2 inference library. It supports the familiar Whisper model names alongside local CTranslate2 model directories, and covers both GPU and CPU inference through the same interface.

Install

Install the faster-whisper extra for RealtimeSTT:
pip install "RealtimeSTT[faster-whisper]"
If you are working from a source checkout:
python -m pip install -e ".[faster-whisper]"

Basic Usage

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="faster_whisper",
    model="small.en",
    device="cuda",
    compute_type="default",
)

Model Names

Known model names are downloaded automatically by faster-whisper. Use download_root to control the cache directory:
recorder = AudioToTextRecorder(
    model="small.en",
    download_root="models/faster-whisper",
)
You can also pass a path to a locally converted CTranslate2 model directory as model.
Model nameNotes
tinySmallest multilingual model
tiny.enEnglish-only, smallest
baseMultilingual base
base.enEnglish-only base
smallMultilingual small
small.enEnglish-only small
mediumMultilingual medium
medium.enEnglish-only medium
large-v1Large multilingual v1
large-v2Large multilingual v2
large-v3Large multilingual v3
distil-* variantsDistilled models (e.g. distil-small.en, distil-medium.en, distil-large-v3)

Compute Types

compute_type controls CTranslate2 precision and quantization. Choose based on your hardware:
compute_typeBest forNotes
defaultGPU or CPUCTranslate2 picks the best available type automatically
float16GPUHalf-precision; requires sufficient VRAM
int8_float16GPUINT8 weights, float16 compute; reduces VRAM usage
int8CPUInteger quantization; fast on CPU
float32CPU reference / debuggingFull precision; slowest on CPU

GPU Setup

Use device="cuda" for GPU inference. gpu_device_index accepts an integer or a list of GPU ids for compatible multi-GPU loading:
recorder = AudioToTextRecorder(
    model="small.en",
    device="cuda",
    compute_type="float16",
    gpu_device_index=0,
)
If CUDA libraries fail to load, reinstall PyTorch and torchaudio for the CUDA version present on your machine before reinstalling faster-whisper.

Engine-Specific Options

The table below maps RealtimeSTT parameters to their underlying faster-whisper counterparts:
RealtimeSTT parameterfaster-whisper mapping
modelWhisperModel(model_size_or_path=...)
download_rootWhisperModel(download_root=...)
deviceWhisperModel(device=...)
compute_typeWhisperModel(compute_type=...)
gpu_device_indexWhisperModel(device_index=...)
beam_sizemodel.transcribe(beam_size=...)
batch_sizeEnables BatchedInferencePipeline when greater than 0
languagePassed as the transcription language when set
initial_promptPassed as initial_prompt
suppress_tokensPassed as suppress_tokens
faster_whisper_vad_filterPassed as vad_filter
normalize_audioNormalizes audio before transcription when enabled

VAD Filter

faster-whisper includes a built-in voice activity detection filter. Enable it with faster_whisper_vad_filter:
recorder = AudioToTextRecorder(
    model="small.en",
    faster_whisper_vad_filter=True,
)
The VAD filter can reduce hallucinations on silent segments, but RealtimeSTT’s own VAD already gates audio before it reaches the engine. Enable faster_whisper_vad_filter only if you observe spurious output on near-silent segments.

Realtime Configuration

Use a smaller realtime_model_type than the final model to keep realtime updates responsive:
recorder = AudioToTextRecorder(
    model="small.en",
    enable_realtime_transcription=True,
    realtime_model_type="tiny.en",
    realtime_processing_pause=0.15,
)
To share a single model between final and realtime transcription, set use_main_model_for_realtime=True. This saves memory but can reduce responsiveness when final and realtime requests contend for the same model.

Troubleshooting

1

CUDA libraries fail to load

Reinstall PyTorch and torchaudio for the CUDA version on your machine, then reinstall faster-whisper. Verify with torch.cuda.is_available().
2

Model downloads fail

Set download_root to a writable directory and verify network access to the Hugging Face Hub. You can also pre-download models and pass the local CTranslate2 directory as model.
3

Realtime text lags behind speech

Use a smaller realtime_model_type, lower beam_size_realtime to 1, increase realtime_processing_pause, or switch realtime to a CPU-friendly engine such as whisper_cpp.

Build docs developers (and LLMs) love