Documentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
faster_whisper is the default RealtimeSTT transcription engine. It wraps the faster-whisper package, which runs Whisper models through the CTranslate2 inference library. It supports the familiar Whisper model names alongside local CTranslate2 model directories, and covers both GPU and CPU inference through the same interface.
Install
Install thefaster-whisper extra for RealtimeSTT:
Basic Usage
- GPU (CUDA)
- CPU
Model Names
Known model names are downloaded automatically by faster-whisper. Usedownload_root to control the cache directory:
model.
| Model name | Notes |
|---|---|
tiny | Smallest multilingual model |
tiny.en | English-only, smallest |
base | Multilingual base |
base.en | English-only base |
small | Multilingual small |
small.en | English-only small |
medium | Multilingual medium |
medium.en | English-only medium |
large-v1 | Large multilingual v1 |
large-v2 | Large multilingual v2 |
large-v3 | Large multilingual v3 |
distil-* variants | Distilled models (e.g. distil-small.en, distil-medium.en, distil-large-v3) |
Compute Types
compute_type controls CTranslate2 precision and quantization. Choose based on your hardware:
compute_type | Best for | Notes |
|---|---|---|
default | GPU or CPU | CTranslate2 picks the best available type automatically |
float16 | GPU | Half-precision; requires sufficient VRAM |
int8_float16 | GPU | INT8 weights, float16 compute; reduces VRAM usage |
int8 | CPU | Integer quantization; fast on CPU |
float32 | CPU reference / debugging | Full precision; slowest on CPU |
GPU Setup
Usedevice="cuda" for GPU inference. gpu_device_index accepts an integer or a list of GPU ids for compatible multi-GPU loading:
If CUDA libraries fail to load, reinstall PyTorch and torchaudio for the CUDA version present on your machine before reinstalling
faster-whisper.Engine-Specific Options
The table below maps RealtimeSTT parameters to their underlying faster-whisper counterparts:| RealtimeSTT parameter | faster-whisper mapping |
|---|---|
model | WhisperModel(model_size_or_path=...) |
download_root | WhisperModel(download_root=...) |
device | WhisperModel(device=...) |
compute_type | WhisperModel(compute_type=...) |
gpu_device_index | WhisperModel(device_index=...) |
beam_size | model.transcribe(beam_size=...) |
batch_size | Enables BatchedInferencePipeline when greater than 0 |
language | Passed as the transcription language when set |
initial_prompt | Passed as initial_prompt |
suppress_tokens | Passed as suppress_tokens |
faster_whisper_vad_filter | Passed as vad_filter |
normalize_audio | Normalizes audio before transcription when enabled |
VAD Filter
faster-whisper includes a built-in voice activity detection filter. Enable it withfaster_whisper_vad_filter:
Realtime Configuration
Use a smallerrealtime_model_type than the final model to keep realtime updates responsive:
use_main_model_for_realtime=True. This saves memory but can reduce responsiveness when final and realtime requests contend for the same model.
Troubleshooting
CUDA libraries fail to load
Reinstall PyTorch and torchaudio for the CUDA version on your machine, then reinstall
faster-whisper. Verify with torch.cuda.is_available().Model downloads fail
Set
download_root to a writable directory and verify network access to the Hugging Face Hub. You can also pre-download models and pass the local CTranslate2 directory as model.