Documentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
openai_whisper uses OpenAI’s original openai-whisper Python package as the transcription backend. It is useful for applications that specifically need compatibility with the original Whisper Python API rather than the faster-whisper CTranslate2 rewrite — for example, when you already have an openai-whisper model cache on disk or need to stay aligned with the upstream Whisper implementation.
Install
openai-whisper package directly:
Basic Usage
Model Names and Download Behavior
OpenAI Whisper downloads known model names automatically through its package cache.download_root is forwarded to whisper.load_model when set:
openai-whisper can also be passed as model. The supported model names mirror the standard Whisper naming scheme: tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large, large-v1, large-v2, large-v3.
CPU and GPU Behavior
Usedevice="cuda" for GPU inference and device="cpu" for CPU inference. The adapter maps the compute_type parameter to OpenAI Whisper’s fp16 flag:
compute_type value | fp16 flag |
|---|---|
float16, fp16, half | fp16=True |
float32, fp32, int8 | fp16=False |
CPU inference should generally use
tiny or base model sizes. Larger models are very slow on CPU with this backend — use faster_whisper with compute_type="int8" for faster CPU inference with medium or large models.Engine-Specific Options
Pass backend-specific configuration throughtranscription_engine_options:
| Option bucket | Meaning |
|---|---|
transcription_engine_options["model"] | Passed to whisper.load_model. |
transcription_engine_options["load_model"] | Also merged into whisper.load_model. |
transcription_engine_options["transcribe"] | Merged into model.transcribe(...). |
beam_size | Passed to transcription when greater than 1. |
initial_prompt | String prompts are supported. |
suppress_tokens | Passed to transcription when set. |
When to Prefer openai-whisper
Good fit
- You already have an
openai-whispermodel cache and want to reuse it - You need to stay aligned with the upstream OpenAI Whisper Python API
- Compatibility with code that calls
whisper.load_modeldirectly
Consider faster-whisper instead
- Production latency-sensitive workloads
- GPU inference with CTranslate2 quantization (
float16,int8_float16) - CPU inference with
int8quantization for faster throughput - Batched inference via
BatchedInferencePipeline
Troubleshooting
ffmpeg errors
Install
ffmpeg and ensure it is on your PATH. The openai-whisper package and some audio workflows depend on it for file handling.CPU inference is too slow
Switch to a smaller model (
tiny or base) or use faster_whisper with compute_type="int8" for significantly better CPU throughput.