Quick Start
TTSKit() automatically downloads the default 0.6B model on first run, loads the tokenizer and six CoreML models concurrently, and is ready to generate.
Requirements
- macOS 15.0 or later
- iOS 18.0 or later
- Xcode 16.0 or later
Features
Real-Time Streaming
Generate and play audio frame-by-frame with adaptive buffering
Multiple Voices
9 built-in voices across 10 languages
Concurrent Generation
Automatic text chunking with parallel generation
Style Control
Natural-language prosody instructions (1.7B model)
Model Variants
TTSKit ships two model sizes:| Model | Size | Platforms | Features |
|---|---|---|---|
| 0.6B | ~1 GB | macOS, iOS | Fast, runs on all devices |
| 1.7B | ~2.2 GB | macOS only | Higher quality, style instructions |
Architecture
TTSKit follows the same component-based architecture as WhisperKit. The pipeline consists of six model components:Model Lifecycle
TTSKit provides fine-grained control over model loading:modelState property tracks the current lifecycle state:
Next Steps
Generate Speech
Learn about generation options and chunking
Playback
Stream audio with real-time playback strategies
Voices & Languages
Explore available voices and language support
Configuration
Configure compute units and model variants