Overview
OpenWhispr supports local transcription using OpenAI’s Whisper models via whisper.cpp. These models run entirely on your device for maximum privacy.Model Characteristics
All Whisper models use GGML quantization for efficient CPU/GPU inference. The models follow a speed vs quality tradeoff:- Smaller models (Tiny, Base): Faster transcription, lower accuracy
- Larger models (Large, Turbo): Slower transcription, higher accuracy
- Recommended: Base model for balanced performance
Storage Location
Models are cached locally at:Available Models
Tiny (75MB)
ggml-tiny.bin75MB (expected: 78,000,000 bytes)
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.binFastest, lower quality. Best for quick transcription when accuracy is not critical.
Base (142MB) — Recommended
ggml-base.bin142MB (expected: 148,000,000 bytes)
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.binGood balance between speed and quality. Recommended for most users.
trueSmall (466MB)
ggml-small.bin466MB (expected: 488,000,000 bytes)
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.binBetter quality, slower. Good for when accuracy matters more than speed.
Medium (1.5GB)
ggml-medium.bin1.5GB (expected: 1,570,000,000 bytes)
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.binHigh quality transcription. Requires more processing time and disk space.
Large (3GB)
ggml-large-v3.bin3GB (expected: 3,140,000,000 bytes)
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3.binBest quality, slowest. Use when maximum accuracy is required.
Turbo (1.6GB)
ggml-large-v3-turbo.bin1.6GB (expected: 1,670,000,000 bytes)
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.binFast with good quality. Optimized version of Large model with better speed.
Downloading Models
Via UI
Models can be downloaded from Settings → Models → Whisper Models.Programmatically
Checking Model Status
Listing All Models
Deleting Models
Transcription
Whisper models support 58 languages including English, Spanish, French, German, Chinese, Japanese, and more. Use
auto for automatic language detection.Model Selection Guide
When to use each model
When to use each model
- Tiny: Testing, low-resource devices, real-time feedback needed
- Base: Default choice for most users, good balance
- Small: When accuracy is important but Large is too slow
- Medium: Professional use, technical content, medical dictation
- Large: Maximum accuracy, critical transcription tasks
- Turbo: Need Large-quality speed without full Large model slowdown
All models are downloaded from HuggingFace’s official whisper.cpp repository. File integrity is validated after download using expected file sizes.
Custom Dictionary
Improve transcription accuracy for specific words by passing them in theinitialPrompt parameter:
Server Pre-warming
OpenWhispr pre-warms the whisper-server process on startup to eliminate 2-5s cold-start delays:The server remains running in the background after the first transcription, providing instant transcription for subsequent recordings.