Skip to main content
TTSKit is an on-device text-to-speech framework built on Core ML. It runs Qwen3 TTS models entirely on Apple silicon with real-time streaming playback, no server required.

Quick Start

import TTSKit

Task {
    let tts = try await TTSKit()
    let result = try await tts.generate(text: "Hello from TTSKit!")
    print("Generated \(result.audioDuration)s of audio at \(result.sampleRate)Hz")
}
TTSKit() automatically downloads the default 0.6B model on first run, loads the tokenizer and six CoreML models concurrently, and is ready to generate.

Requirements

  • macOS 15.0 or later
  • iOS 18.0 or later
  • Xcode 16.0 or later

Features

Real-Time Streaming

Generate and play audio frame-by-frame with adaptive buffering

Multiple Voices

9 built-in voices across 10 languages

Concurrent Generation

Automatic text chunking with parallel generation

Style Control

Natural-language prosody instructions (1.7B model)

Model Variants

TTSKit ships two model sizes:
ModelSizePlatformsFeatures
0.6B~1 GBmacOS, iOSFast, runs on all devices
1.7B~2.2 GBmacOS onlyHigher quality, style instructions
// Fast, runs on all platforms
let tts = try await TTSKit(TTSKitConfig(model: .qwen3TTS_0_6b))

// Higher quality, macOS only
let tts = try await TTSKit(TTSKitConfig(model: .qwen3TTS_1_7b))
Models are hosted on HuggingFace and cached locally after the first download.

Architecture

TTSKit follows the same component-based architecture as WhisperKit. The pipeline consists of six model components:
public class TTSKit {
    // Model components (protocol-typed, swappable)
    public var textProjector: any TextProjecting
    public var codeEmbedder: any CodeEmbedding
    public var multiCodeEmbedder: any MultiCodeEmbedding
    public var codeDecoder: any CodeDecoding
    public var multiCodeDecoder: any MultiCodeDecoding
    public var speechDecoder: any SpeechDecoding
    public var tokenizer: (any Tokenizer)?
}
Each component can be swapped at runtime:
let config = TTSKitConfig(load: false)
let tts = try await TTSKit(config)
tts.codeDecoder = MyOptimizedCodeDecoder()
try await tts.loadModels()

Model Lifecycle

TTSKit provides fine-grained control over model loading:
// Auto-load on init (default)
let tts = try await TTSKit()

// Manual control
let config = TTSKitConfig(load: false)
let tts = try await TTSKit(config)

// Prewarm: compile models sequentially to cap peak memory
try await tts.prewarmModels()

// Load: load all models concurrently
try await tts.loadModels()

// Unload to free memory
await tts.unloadModels()
The modelState property tracks the current lifecycle state:
public enum ModelState {
    case unloaded
    case downloading
    case downloaded
    case loading
    case loaded
    case prewarming
    case prewarmed
    case unloading
}

Next Steps

Generate Speech

Learn about generation options and chunking

Playback

Stream audio with real-time playback strategies

Voices & Languages

Explore available voices and language support

Configuration

Configure compute units and model variants

Build docs developers (and LLMs) love