Overview
WhisperKit follows semantic versioning. This page documents major changes, new features, bug fixes, and breaking changes across versions.For detailed commit history, see GitHub Releases.
Version 0.9.0
Latest Release
Current stable version with TTSKit integration and local server
Release Date
Released: 2024Major Features
TTSKit - Text-to-Speech Framework
TTSKit - Text-to-Speech Framework
New text-to-speech capabilities:Features:See TTSKit Guide for details.
- On-device TTS using Qwen3 models
- Two model sizes: 0.6B (iOS/macOS) and 1.7B (macOS)
- Real-time streaming playback
- 9 voices in 10 languages
- Style instructions (1.7B model)
- Audio export (WAV, M4A)
WhisperKit Local Server
WhisperKit Local Server
OpenAI-compatible HTTP server:Features:API Endpoints:
- Implements OpenAI Audio API
- Server-Sent Events (SSE) streaming
- Compatible with OpenAI SDKs
- Auto-generated OpenAPI specification
- Example clients (Python, Swift, curl)
POST /v1/audio/transcriptionsPOST /v1/audio/translations
Improvements
- Enhanced streaming transcription performance
- Better memory management for large models
- Improved model loading and caching
- CLI enhancements with new commands
- Updated model repository structure
- Better error messages and debugging
Bug Fixes
- Fixed memory leaks in long-running transcription
- Resolved model download issues on slow connections
- Fixed timestamp alignment in certain edge cases
- Improved handling of corrupted audio files
- Fixed crashes when switching models rapidly
Dependencies
- Swift 5.9+
- macOS 14.0+ (WhisperKit), 15.0+ (TTSKit)
- iOS 16.0+ (WhisperKit), 18.0+ (TTSKit)
- Xcode 16.0+
Breaking Changes
Version 0.8.0
Release Date
Released: 2024Major Features
Unified Configuration API
Unified Configuration API
Centralized configuration through
WhisperKitConfig:Enhanced Model Selection
Enhanced Model Selection
Support for glob patterns in model selection:
Improved Streaming
Improved Streaming
Better real-time transcription with:
- Lower latency
- More accurate intermediate results
- Better VAD integration
- Reduced memory usage
Improvements
- Faster model loading from cache
- Better error handling and recovery
- Improved voice activity detection
- Enhanced word timestamp accuracy
- Better multilingual support
- Reduced peak memory usage
Bug Fixes
- Fixed race conditions in streaming mode
- Resolved model cache corruption issues
- Fixed timestamp drift in long audio
- Improved handling of silence
- Fixed crashes on certain audio formats
Deprecations
Version 0.7.0
Release Date
Released: 2023Major Features
- Swift CLI tool for command-line transcription
- Enhanced model repository on HuggingFace
- Support for custom model repositories
- Improved benchmark suite
- Better documentation and examples
Improvements
- 20% faster transcription on M1 Macs
- Reduced model download size
- Better progress reporting
- Enhanced example applications
- Improved API documentation
Bug Fixes
- Fixed model loading on iOS devices
- Resolved audio buffer overflow issues
- Fixed language detection accuracy
- Improved error messages
Version 0.6.0
Release Date
Released: 2023Major Features
- Support for Whisper large-v3 models
- Distilled model support
- Voice activity detection integration
- Real-time streaming transcription
- Word-level timestamps
Improvements
- 30% faster model loading
- Better memory efficiency
- Improved accuracy on noisy audio
- Enhanced iOS support
Earlier Versions
Version 0.5.0 and earlier
Version 0.5.0 and earlier
Version 0.5.0
- Initial public release
- Support for Whisper base, small, medium models
- iOS and macOS support
- Basic transcription API
Version 0.4.0 (Beta)
- Beta release for early adopters
- CoreML model optimization
- Basic streaming support
Version 0.3.0 (Alpha)
- Alpha release for testing
- Proof of concept implementation
Upcoming Features
These features are planned for future releases. Follow development on GitHub.
Version 1.0 (Planned)
Stable API
API stability guarantees
Enhanced Models
New optimized model variants
More Languages
Additional language support
Better Diarization
Improved speaker detection
Future Roadmap
- Enhanced Streaming: Lower latency, better accuracy
- More TTS Voices: Additional voice options
- Custom Wake Words: On-device wake word detection
- Noise Reduction: Advanced audio preprocessing
- Batch Processing: Efficient multi-file transcription
- Cloud Sync: Optional cloud backup and sync
Version Support
- Current
- Maintenance
- End of Life
Active Support
Version 0.9.x- ✅ Bug fixes
- ✅ Security updates
- ✅ New features
- ✅ Community support
Migration Guides
Migrate to 0.9.x
Upgrade from any previous version
Breaking Changes
Review breaking changes by version
Reporting Issues
Found a bug or have a feature request?Check Existing Issues
Search GitHub Issues to avoid duplicates.
Gather Information
Collect:
- WhisperKit version
- Device and OS version
- Steps to reproduce
- Expected vs actual behavior
Create Issue
Create a new issue with details.
Release Notes Format
Each release includes:- Features: New capabilities and functionality
- Improvements: Performance and quality enhancements
- Bug Fixes: Resolved issues
- Breaking Changes: API changes requiring code updates
- Deprecations: Features scheduled for removal
- Migration Guide: Steps to update from previous versions
Staying Updated
GitHub
Watch the repository for releases
Discord
Join for release announcements
RSS Feed
Subscribe to release feed
Follow for updates
Next Steps
Migration Guide
Upgrade to the latest version
FAQ
Common questions answered
Contributing
Help shape future releases
Benchmarks
Compare versions