General
What is WhisperKit?
What is WhisperKit?
- Real-time streaming transcription
- Word-level timestamps
- Voice activity detection
- Multiple language support
- Custom model deployment
What is TTSKit?
What is TTSKit?
- Real-time streaming playback
- Multiple voices and languages
- Style control (1.7B model)
- No server required
- macOS and iOS support
What's the difference between WhisperKit and Argmax Pro SDK?
What's the difference between WhisperKit and Argmax Pro SDK?
- 9x faster transcription with Nvidia Parakeet V3
- Real-time speaker diarization
- Deepgram-compatible WebSocket server
- Commercial support and SLAs
Is WhisperKit free to use?
Is WhisperKit free to use?
Does WhisperKit require an internet connection?
Does WhisperKit require an internet connection?
- Initial model download requires internet
- Model updates require internet
- Benchmark result uploads require internet (optional)
Installation & Setup
What are the system requirements?
What are the system requirements?
- macOS 14.0+ or iOS 16.0+
- Apple Silicon (M1/M2/M3/M4) or A12 Bionic+
- Xcode 16.0+
- macOS 15.0+ or iOS 18.0+
- Apple Silicon required
Can I use WhisperKit on Intel Macs?
Can I use WhisperKit on Intel Macs?
Does WhisperKit work in iOS Simulator?
Does WhisperKit work in iOS Simulator?
How do I install WhisperKit?
How do I install WhisperKit?
- In Xcode:
File > Add Package Dependencies - Enter:
https://github.com/argmaxinc/whisperkit - Select WhisperKit and/or TTSKit products
How much storage do models require?
How much storage do models require?
| Model | Size |
|---|---|
| tiny | ~40 MB |
| base | ~75 MB |
| small | ~250 MB |
| medium | ~800 MB |
| large-v3 | ~1.6 GB |
| distil-large-v3 | ~800 MB |
Models & Performance
Which model should I use?
Which model should I use?
- iPhone 15 Pro: medium or smaller
- iPhone 14 Pro: small or smaller
- M1 Mac+: All models including large-v3
- Use large-v3 or distil-large-v3
- Use tiny or base
What are distilled models?
What are distilled models?
distil-large-v3) are smaller, faster versions that maintain most of the accuracy of larger models. They’re created through knowledge distillation.Benefits:- 50% smaller than full models
- 2-3x faster inference
- ~95% of original accuracy
- Better for resource-constrained devices
Can I use custom fine-tuned models?
Can I use custom fine-tuned models?
- Fine-tune Whisper on your data
- Convert to CoreML format
- Upload to HuggingFace
- Load in WhisperKit:
How fast is real-time transcription?
How fast is real-time transcription?
- tiny: ~0.1 (10x faster than real-time)
- small: ~0.3
- medium: ~0.7
- large-v3: ~1.5 (not real-time)
How accurate is WhisperKit compared to cloud APIs?
How accurate is WhisperKit compared to cloud APIs?
- OpenAI Whisper API (same models)
- Deepgram (similar performance)
- AssemblyAI (competitive)
- No network latency
- Complete privacy
- Works offline
- No per-minute costs
Features & Usage
What audio formats are supported?
What audio formats are supported?
- WAV (recommended)
- MP3
- M4A
- FLAC
- Raw audio buffers
- Microphone input
How do I get word-level timestamps?
How do I get word-level timestamps?
DecodingOptions with word timestamps:Can I transcribe in languages other than English?
Can I transcribe in languages other than English?
How do I stream real-time transcription?
How do I stream real-time transcription?
Does WhisperKit support speaker diarization?
Does WhisperKit support speaker diarization?
Can I translate audio to English?
Can I translate audio to English?
TTSKit Specific
Which TTSKit model should I use?
Which TTSKit model should I use?
- Runs on macOS and iOS
- ~1 GB download
- Fast inference
- 9 voices, 10 languages
- macOS only
- ~2.2 GB download
- Higher quality
- Supports style instructions
- Same voices and languages
What voices are available?
What voices are available?
.ryan- Male, clear and professional.aiden- Male, warm and friendly.onoAnna- Female, bright and energetic.sohee- Female, calm and soothing.eric- Male, deep and authoritative.dylan- Male, young and casual.serena- Female, elegant and refined.vivian- Female, confident and dynamic.uncleFu- Male, wise and mature
How do I use style instructions?
How do I use style instructions?
Can I stream audio playback?
Can I stream audio playback?
play method for real-time streaming:Local Server
What is the WhisperKit Local Server?
What is the WhisperKit Local Server?
How do I use Python with WhisperKit?
How do I use Python with WhisperKit?
Does the local server support streaming?
Does the local server support streaming?
stream=true parameter:Troubleshooting
Model download fails or is very slow
Model download fails or is very slow
-
Install git-lfs:
- Check disk space - Ensure sufficient storage for models
-
Try a smaller model - Start with
tinyto verify setup -
Clear cache - Delete
~/.cache/whisperkit/and retry
Transcription is too slow
Transcription is too slow
- Use a smaller model - Switch from large to medium/small
- Use distilled models - Try
distil-large-v3 - Adjust compute units - Configure CoreML compute units
- Check thermal throttling - Device may be overheating
- Reduce precision - Use quantized models if available
Getting poor transcription accuracy
Getting poor transcription accuracy
- Use a larger model - large-v3 is most accurate
- Specify the language - Don’t rely on auto-detection for best results
- Provide context - Use prompt parameter for domain-specific content
- Check audio quality - Ensure clear audio, low background noise
- Adjust VAD settings - Fine-tune voice activity detection
Build errors in Xcode
Build errors in Xcode
- Update Xcode - Ensure Xcode 16.0+
- Clean build folder -
⌘⇧Kin Xcode - Reset package cache -
File > Packages > Reset Package Caches - Check deployment target - macOS 14.0+, iOS 16.0+
- Update dependencies -
File > Packages > Update to Latest Package Versions
Crashes on device but not Mac
Crashes on device but not Mac
- Memory usage - Large models may exceed device memory
- iOS version - Ensure iOS 16.0+ (18.0+ for TTSKit)
- Model size - Use smaller model for older devices
- Background processing - Check app lifecycle handling
- Permissions - Verify microphone permissions
Support & Community
Where can I get help?
Where can I get help?
- Discord: Join our community
- GitHub Issues: Report bugs
- Email: info@argmaxinc.com
- Documentation: Browse docs
How can I contribute?
How can I contribute?
- Fix bugs and add features
- Improve documentation
- Submit benchmark results
- Share example projects
Can I use WhisperKit commercially?
Can I use WhisperKit commercially?
How do I report a bug?
How do I report a bug?
- Check if the issue already exists
- Include:
- Device and OS version
- WhisperKit version
- Steps to reproduce
- Sample code if possible
- Add relevant logs