Model Catalog - WhisperKit

WhisperKit Models

All WhisperKit models are hosted on HuggingFace in CoreML format, optimized for Apple Neural Engine.

Model Repository

WhisperKit CoreML Models

Browse all available models on HuggingFace

Standard Whisper Models

tiny
base
small
medium
large-v3

Tiny

Model ID: openai_whisper-tiny

Size

~40 MB

Parameters

39M

Memory

~200 MB RAM

Speed

Fastest

Best for:

Quick testing and prototyping
Resource-constrained devices
When speed is more important than accuracy
iPhone 13 and earlier devices

Performance:

Real-time on all supported devices
WER (Word Error Rate): ~15-20% on English
RTF < 0.2 on most devices

Usage:

let pipe = try await WhisperKit(WhisperKitConfig(model: "tiny"))

Base

Model ID: openai_whisper-base

Size

~75 MB

Parameters

74M

Memory

~300 MB RAM

Speed

Very Fast

Best for:

Balance of speed and accuracy
General-purpose transcription
iPhone 13 and newer
Apps requiring low latency

Performance:

Real-time on all supported devices
WER: ~10-15% on English
RTF < 0.3 on most devices

Usage:

let pipe = try await WhisperKit(WhisperKitConfig(model: "base"))

Small

Model ID: openai_whisper-small

Size

~250 MB

Parameters

244M

Memory

~800 MB RAM

Speed

Fast

Best for:

Good accuracy with reasonable speed
iPhone 14 and newer
M1 Macs and above
Production applications

Performance:

Real-time on iPhone 14 Pro and newer
WER: ~8-12% on English
RTF ~0.4-0.6 on modern devices

Usage:

let pipe = try await WhisperKit(WhisperKitConfig(model: "small"))

Medium

Model ID: openai_whisper-medium

Size

~800 MB

Parameters

769M

Memory

~2 GB RAM

Speed

Moderate

Best for:

High accuracy requirements
iPhone 15 Pro and newer
M1 Macs and above
Offline transcription tasks

Performance:

Real-time on iPhone 15 Pro, M1 Mac+
WER: ~6-9% on English
RTF ~0.7-1.0 on modern devices

Usage:

let pipe = try await WhisperKit(WhisperKitConfig(model: "medium"))

Large V3

Model ID: openai_whisper-large-v3

Size

~1.6 GB

Parameters

1.55B

Memory

~4 GB RAM

Speed

Slower

Best for:

Maximum accuracy
Desktop/server applications
Mac Studio, MacBook Pro (M3 Pro+)
Offline high-quality transcription

Performance:

Real-time on M2 Pro and above
WER: ~4-6% on English
RTF ~1.2-2.0 depending on device
Best multilingual support

Usage:

let pipe = try await WhisperKit(WhisperKitConfig(model: "large-v3"))

Distilled Models

Distilled models provide significant performance improvements with minimal accuracy loss through knowledge distillation.

Distil-Large-V3

Model ID: distil-whisper_distil-large-v3

Size

~800 MB

Parameters

756M

vs. Large-V3

50% smaller, 2x faster

Accuracy

~95% of large-v3

Advantages:

Significantly faster than large-v3
Much smaller download and memory footprint
Near-identical accuracy to large-v3
Real-time on iPhone 15 Pro
Recommended for most use cases

Performance:

WER: ~5-7% on English
RTF ~0.6-0.9 on modern devices
Runs well on iPhone 14 Pro and newer

Usage:

let pipe = try await WhisperKit(WhisperKitConfig(model: "distil*large-v3"))
// Glob pattern matches distil-whisper_distil-large-v3

Other Distilled Models

Several other distilled variants are available in the model repository:

distil-whisper_distil-medium.en
distil-whisper_distil-small.en

These are English-only models optimized for even faster inference.

Model Selection Guide

By Device
By Use Case
By Language

iPhone

Device	Recommended	Real-Time
iPhone 15 Pro	distil-large-v3, medium	large-v3
iPhone 14 Pro	medium, small	medium
iPhone 13 Pro	small, base	small
iPhone 12/13	base, tiny	base

iPad

Device	Recommended	Real-Time
iPad Pro (M1+)	large-v3, distil-large-v3	large-v3
iPad Air (M1+)	medium, distil-large-v3	medium
iPad (A14+)	small, base	small

Mac

Device	Recommended	Real-Time
Mac Studio (Ultra)	large-v3	All models
MacBook Pro (M3 Pro+)	large-v3	large-v3
MacBook Air (M1+)	distil-large-v3, medium	medium
Mac mini (M1+)	medium, small	small

Custom Models

You can create and deploy custom fine-tuned models using whisperkittools.

Creating Custom Models

Fine-tune Whisper

Use whisperkittools to fine-tune on your dataset:

python -m whisperkittools.train \
  --model large-v3 \
  --dataset your_dataset \
  --output-dir custom_model

Convert to CoreML

Convert the fine-tuned model to CoreML:

python -m whisperkittools.convert \
  --model custom_model \
  --output-dir coreml_model

Upload to HuggingFace

Upload to your HuggingFace repository:

huggingface-cli upload username/model-repo coreml_model

Use in WhisperKit

Load your custom model:

let config = WhisperKitConfig(
    model: "large-v3",
    modelRepo: "username/model-repo"
)
let pipe = try await WhisperKit(config)

Use Cases for Custom Models

Domain-specific vocabulary (medical, legal, technical)
Accents and dialects
Background noise handling
Custom wake words
Language variants

TTSKit Models

0.6B
1.7B

Qwen3 TTS 0.6B

Model ID: qwen3TTS_0_6b

Size

~1 GB

Parameters

600M

Platforms

macOS, iOS

Speed

Fast

Features:

9 voices
10 languages
Real-time streaming
Runs on all platforms

Performance:

Generates ~2-3s audio per second on M1
Suitable for real-time playback
Lower memory requirements

Usage:

let tts = try await TTSKit(TTSKitConfig(model: .qwen3TTS_0_6b))
let result = try await tts.generate(text: "Hello!")

Qwen3 TTS 1.7B

Model ID: qwen3TTS_1_7b

Size

~2.2 GB

Parameters

1.7B

Platforms

macOS only

Quality

Higher

Features:

9 voices (same as 0.6B)
10 languages
Style instructions (unique to 1.7B)
Better prosody and naturalness

Performance:

Generates ~1-2s audio per second on M1
Requires more memory (~4 GB)
macOS 15.0+ required

Usage:

let tts = try await TTSKit(TTSKitConfig(model: .qwen3TTS_1_7b))

var options = GenerationOptions()
options.instruction = "Speak warmly and slowly."

let result = try await tts.generate(
    text: "Hello!",
    options: options
)

TTSKit Voices

All models support these 9 voices:

Voice	Style	Best For
`.ryan`	Clear, professional	Business, narration
`.aiden`	Warm, friendly	Customer service
`.onoAnna`	Bright, energetic	Announcements
`.sohee`	Calm, soothing	Meditation, audiobooks
`.eric`	Deep, authoritative	News, presentations
`.dylan`	Young, casual	Social media, gaming
`.serena`	Elegant, refined	Luxury brands
`.vivian`	Confident, dynamic	Fitness, motivation
`.uncleFu`	Wise, mature	Storytelling, teaching

TTSKit Languages

English
Chinese (Mandarin)
Japanese
Korean
German
French
Russian
Portuguese
Spanish
Italian

Model Download

Automatic Download

WhisperKit automatically downloads the recommended model on first use:

// Downloads default model for device
let pipe = try await WhisperKit()

Manual Download

Download specific models via CLI:

# Download single model
make download-model MODEL=large-v3

# Download all models
make download-models

Model Caching

Models are cached at:

macOS: ~/.cache/whisperkit/
iOS: App’s cache directory

To clear cache:

rm -rf ~/.cache/whisperkit/

Performance Benchmarks

View Detailed Benchmarks

Compare performance across devices and models

Next Steps

Supported Devices

Check device compatibility

Benchmarks

Run performance tests

Quick Start

Start transcribing

Custom Models

Create fine-tuned models

Community

Reference

Documentation Index

​WhisperKit Models

​Model Repository

WhisperKit CoreML Models

​Standard Whisper Models

​Tiny

Size

Parameters

Memory

Speed

​Base

Size

Parameters

Memory

Speed

​Small

Size

Parameters

Memory

Speed

​Medium

Size

Parameters

Memory

Speed

​Large V3

Size

Parameters

Memory

Speed

​Distilled Models

​Distil-Large-V3

Size

Parameters

vs. Large-V3

Accuracy

​Other Distilled Models

​Model Selection Guide

​iPhone

​iPad

​Mac

​Real-Time Streaming

​Offline Transcription

​Multilingual

​Low-Resource Devices

​High Accuracy

​English Only

​Multilingual (99+ languages)

​Specific Language Performance

​Custom Models

​Creating Custom Models

​Use Cases for Custom Models

​TTSKit Models

​Qwen3 TTS 0.6B

Size

Parameters

Platforms

Speed

​Qwen3 TTS 1.7B

Size

Parameters

Platforms

Quality

​TTSKit Voices

​TTSKit Languages

​Model Download

​Automatic Download

​Manual Download

​Model Caching

​Performance Benchmarks

View Detailed Benchmarks

​Next Steps

Supported Devices

Benchmarks

Quick Start

Custom Models

Build docs developers (and LLMs) love

WhisperKit Models

Model Repository

Standard Whisper Models

Tiny

Base

Small

Medium

Large V3

Distilled Models

Distil-Large-V3

Other Distilled Models

Model Selection Guide

iPhone

iPad

Mac

Real-Time Streaming

Offline Transcription

Multilingual

Low-Resource Devices

High Accuracy

English Only

Multilingual (99+ languages)

Specific Language Performance

Custom Models

Creating Custom Models

Use Cases for Custom Models

TTSKit Models

Qwen3 TTS 0.6B

Qwen3 TTS 1.7B

TTSKit Voices

TTSKit Languages

Model Download

Automatic Download

Manual Download

Model Caching

Performance Benchmarks

Next Steps