Model Selection
WhisperKit supports all official OpenAI Whisper model variants, from tiny to large-v3. Choosing the right model involves balancing accuracy, speed, and memory usage based on your application’s requirements.Available Models
Whisper models come in different sizes, each with multilingual and English-only variants:Model Variants
Tiny (39M parameters)
Tiny (39M parameters)
Best for: Real-time streaming, constrained devices, quick prototyping
- Fastest inference
- Lowest memory footprint (~75 MB)
- Acceptable accuracy for clear audio
- Available:
tiny(multilingual),tiny.en(English-only)
Base (74M parameters)
Base (74M parameters)
Best for: Mobile apps, moderate accuracy requirements
- Good balance of speed and accuracy
- Memory footprint ~140 MB
- Suitable for most mobile applications
- Available:
base,base.en
Small (244M parameters)
Small (244M parameters)
Best for: Production applications, higher accuracy needs
- Good accuracy for production use
- Memory footprint ~460 MB
- Slower than base but more accurate
- Available:
small,small.en
Medium (769M parameters)
Medium (769M parameters)
Best for: High accuracy requirements, server-side processing
- Very good accuracy
- Memory footprint ~1.5 GB
- Slower inference
- Available:
medium,medium.en
Large (1550M parameters)
Large (1550M parameters)
Best for: Maximum accuracy, offline batch processing
- Best accuracy
- Memory footprint ~3 GB
- Slowest inference
- Available:
large,large-v2,large-v3
ModelVariant Enum
Recommended Models
WhisperKit provides device-specific recommendations:Get Recommended Models
Device-Specific Recommendations
Recommendations are based on device hardware:Downloading Models
Automatic Download
By default, WhisperKit downloads models automatically:Manual Download
Download a model without initializing WhisperKit:List Available Models
Local Models
Use pre-downloaded or bundled models:Bundle Models in App
Model Repositories
WhisperKit downloads models from Hugging Face repositories:Default Repository
Custom Repository
Custom Endpoint
Download Configuration
Background Downloads
Enable background downloads for large models:Custom Download Location
Model States and Loading
Prewarming Models
Prewarm models to reduce peak memory usage:Prewarming loads models one at a time to trigger Core ML specialization without high peak memory. This doubles load time but reduces memory pressure.
Deferred Loading
Unload Models
Multilingual vs English-only
When to Use Multilingual Models
- Transcribing content in multiple languages
- Language is unknown in advance
- Need automatic language detection
- Translation to English (
.translatetask)
When to Use English-only Models
- Only transcribing English audio
- Slightly faster inference
- Marginally better English accuracy
Model Performance Comparison
Performance varies by device. These are approximate values for reference.
| Model | Size | Parameters | Relative Speed | Memory | Accuracy |
|---|---|---|---|---|---|
| tiny | 75 MB | 39M | 32x | ~150 MB | Good |
| base | 140 MB | 74M | 16x | ~250 MB | Better |
| small | 460 MB | 244M | 6x | ~600 MB | Very Good |
| medium | 1.5 GB | 769M | 2x | ~1.8 GB | Excellent |
| large-v3 | 3 GB | 1550M | 1x | ~3.2 GB | Best |
Selection Guidelines
Real-time Streaming
Recommended: tiny, baseFast enough to transcribe live audio without lag on most devices.
Mobile Apps
Recommended: base, smallBalance of accuracy and app size. Consider on-demand download instead of bundling.
High Accuracy
Recommended: medium, large-v3Best for offline processing, server deployments, or high-end devices.
Constrained Devices
Recommended: tinyOnly option for devices with limited memory or older hardware.
Next Steps
Configuration
Configure compute options and advanced settings
Transcription
Start transcribing with your selected model