Skip to main content

On-Device Speech AI forApple Silicon

Deploy state-of-the-art speech-to-text and text-to-speech models directly on iOS, macOS, watchOS, and visionOS with real-time streaming, voice activity detection, and more.

Quick start

Get up and running with WhisperKit in minutes

1

Install via Swift Package Manager

Add WhisperKit to your project by adding the package dependency in Xcode or your Package.swift file:
Package.swift
dependencies: [
    .package(url: "https://github.com/argmaxinc/WhisperKit.git", from: "0.9.0")
]
Then add the products you need as target dependencies:
.target(
    name: "YourApp",
    dependencies: [
        "WhisperKit", // speech-to-text
        "TTSKit",     // text-to-speech
    ]
)
2

Initialize WhisperKit

Import and initialize WhisperKit in your Swift code. The framework automatically downloads the recommended model for your device:
import WhisperKit

Task {
    let pipe = try? await WhisperKit()
    print("WhisperKit initialized and ready")
}
WhisperKit automatically selects and downloads the optimal model for your device. For custom model selection, see the model selection guide.
3

Transcribe audio

Use the transcribe method to convert audio files to text:
let transcription = try? await pipe!.transcribe(
    audioPath: "path/to/audio.wav"
)?.text
print(transcription)
WhisperKit supports multiple audio formats including WAV, MP3, M4A, and FLAC.
4

Generate speech with TTSKit

For text-to-speech, import TTSKit and generate audio from text:
import TTSKit

Task {
    let tts = try await TTSKit()
    let result = try await tts.generate(text: "Hello from TTSKit!")
    
    // Or play directly with streaming
    try await tts.play(text: "Hello from TTSKit!")
}

Core features

Everything you need for on-device speech AI

Speech-to-Text

Deploy Whisper models on-device with real-time streaming transcription and word-level timestamps

Text-to-Speech

Generate natural speech with multiple voices and languages using Qwen3 TTS models

Voice Activity Detection

Automatically detect speech segments with built-in energy-based VAD

Multi-Platform

Run on iOS, macOS, watchOS, and visionOS with optimized Core ML models

Local Server

OpenAI-compatible API for local speech processing with streaming support

Model Management

Automatic model downloading from HuggingFace with custom model support

Resources

Additional resources to help you succeed

Model Catalog

Browse available Whisper and TTS models with performance benchmarks

Benchmarks

View performance metrics across different Apple Silicon devices

Contributing

Learn how to contribute to WhisperKit development

Discord Community

Join the community for support and discussions

Ready to get started?

Build powerful on-device speech applications with WhisperKit today

Build docs developers (and LLMs) love