TTSKit is an on-device text-to-speech framework built on Core ML. It runs Qwen3 TTS models entirely on Apple silicon with real-time streaming playback, no server required.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/argmaxinc/WhisperKit/llms.txt
Use this file to discover all available pages before exploring further.
Quick Start
TTSKit() automatically downloads the default 0.6B model on first run, loads the tokenizer and six CoreML models concurrently, and is ready to generate.
Requirements
- macOS 15.0 or later
- iOS 18.0 or later
- Xcode 16.0 or later
Features
Real-Time Streaming
Generate and play audio frame-by-frame with adaptive buffering
Multiple Voices
9 built-in voices across 10 languages
Concurrent Generation
Automatic text chunking with parallel generation
Style Control
Natural-language prosody instructions (1.7B model)
Model Variants
TTSKit ships two model sizes:| Model | Size | Platforms | Features |
|---|---|---|---|
| 0.6B | ~1 GB | macOS, iOS | Fast, runs on all devices |
| 1.7B | ~2.2 GB | macOS only | Higher quality, style instructions |
Architecture
TTSKit follows the same component-based architecture as WhisperKit. The pipeline consists of six model components:Model Lifecycle
TTSKit provides fine-grained control over model loading:modelState property tracks the current lifecycle state:
Next Steps
Generate Speech
Learn about generation options and chunking
Playback
Stream audio with real-time playback strategies
Voices & Languages
Explore available voices and language support
Configuration
Configure compute units and model variants