Skip to main content

Model Selection

WhisperKit supports all official OpenAI Whisper model variants, from tiny to large-v3. Choosing the right model involves balancing accuracy, speed, and memory usage based on your application’s requirements.

Available Models

Whisper models come in different sizes, each with multilingual and English-only variants:

Model Variants

Best for: Real-time streaming, constrained devices, quick prototyping
  • Fastest inference
  • Lowest memory footprint (~75 MB)
  • Acceptable accuracy for clear audio
  • Available: tiny (multilingual), tiny.en (English-only)
let whisperKit = try await WhisperKit(model: "tiny")
Best for: Mobile apps, moderate accuracy requirements
  • Good balance of speed and accuracy
  • Memory footprint ~140 MB
  • Suitable for most mobile applications
  • Available: base, base.en
let whisperKit = try await WhisperKit(model: "base")
Best for: Production applications, higher accuracy needs
  • Good accuracy for production use
  • Memory footprint ~460 MB
  • Slower than base but more accurate
  • Available: small, small.en
let whisperKit = try await WhisperKit(model: "small")
Best for: High accuracy requirements, server-side processing
  • Very good accuracy
  • Memory footprint ~1.5 GB
  • Slower inference
  • Available: medium, medium.en
let whisperKit = try await WhisperKit(model: "medium")
Best for: Maximum accuracy, offline batch processing
  • Best accuracy
  • Memory footprint ~3 GB
  • Slowest inference
  • Available: large, large-v2, large-v3
let whisperKit = try await WhisperKit(model: "large-v3")
See ModelVariant

ModelVariant Enum

public enum ModelVariant: CustomStringConvertible {
    case tiny
    case tinyEn
    case base
    case baseEn
    case small
    case smallEn
    case medium
    case mediumEn
    case large
    case largev2
    case largev3
    
    var isMultilingual: Bool {
        // Returns true for multilingual models
        // Returns false for .en variants
    }
}
WhisperKit provides device-specific recommendations:
// Get locally computed recommendations
let localSupport = WhisperKit.recommendedModels()
print("Default model: \(localSupport.default)")
print("Supported models: \(localSupport.supported)")

// Get recommendations from remote config
let remoteSupport = await WhisperKit.recommendedRemoteModels(
    from: "argmaxinc/whisperkit-coreml"
)
print("Recommended: \(remoteSupport.default)")
See WhisperKit.recommendedModels and WhisperKit.recommendedRemoteModels

Device-Specific Recommendations

Recommendations are based on device hardware:
let deviceName = WhisperKit.deviceName()
print("Running on: \(deviceName)")

// Example device identifiers:
// - "iPhone15,2" (iPhone 14 Pro)
// - "iPad13,16" (iPad Pro M2)
// - "Mac14,2" (Mac Studio M2)
See WhisperKit.deviceName

Downloading Models

Automatic Download

By default, WhisperKit downloads models automatically:
// Downloads and loads the default recommended model
let whisperKit = try await WhisperKit()

// Downloads a specific model
let whisperKit = try await WhisperKit(model: "base")
See WhisperKitConfig.download

Manual Download

Download a model without initializing WhisperKit:
let modelFolder = try await WhisperKit.download(
    variant: "large-v3",
    from: "argmaxinc/whisperkit-coreml",
    progressCallback: { progress in
        print("Downloaded: \(progress.fractionCompleted * 100)%")
    }
)

print("Model saved to: \(modelFolder.path)")
See WhisperKit.download

List Available Models

let availableModels = try await WhisperKit.fetchAvailableModels(
    from: "argmaxinc/whisperkit-coreml"
)

print("Available models:")
for model in availableModels {
    print("  - \(model)")
}
See WhisperKit.fetchAvailableModels

Local Models

Use pre-downloaded or bundled models:
// Use a local model folder
let whisperKit = try await WhisperKit(
    modelFolder: "/path/to/model/folder",
    download: false  // Disable automatic download
)
See WhisperKitConfig.modelFolder

Bundle Models in App

// Get bundled model path
guard let modelPath = Bundle.main.path(
    forResource: "openai_whisper-base",
    ofType: nil
) else {
    fatalError("Model not found in bundle")
}

let whisperKit = try await WhisperKit(
    modelFolder: modelPath,
    download: false
)
Bundling large models increases app size significantly. Consider downloading on first launch instead.

Model Repositories

WhisperKit downloads models from Hugging Face repositories:

Default Repository

// Default: argmaxinc/whisperkit-coreml
let whisperKit = try await WhisperKit(model: "base")

Custom Repository

let whisperKit = try await WhisperKit(
    model: "base",
    modelRepo: "your-username/your-repo",
    modelToken: "hf_your_token_here"  // If repo is private
)
See WhisperKitConfig.modelRepo

Custom Endpoint

let config = WhisperKitConfig(
    model: "base",
    modelEndpoint: "https://your-custom-endpoint.com"
)

let whisperKit = try await WhisperKit(config)
See WhisperKitConfig.modelEndpoint

Download Configuration

Background Downloads

Enable background downloads for large models:
let whisperKit = try await WhisperKit(
    model: "large-v3",
    useBackgroundDownloadSession: true
)
See WhisperKitConfig.useBackgroundDownloadSession

Custom Download Location

let customBase = FileManager.default.urls(
    for: .documentDirectory,
    in: .userDomainMask
).first!

let whisperKit = try await WhisperKit(
    model: "base",
    downloadBase: customBase
)
See WhisperKitConfig.downloadBase

Model States and Loading

Prewarming Models

Prewarm models to reduce peak memory usage:
let whisperKit = try await WhisperKit(
    model: "medium",
    prewarm: true  // Load and unload models sequentially
)
See WhisperKitConfig.prewarm
Prewarming loads models one at a time to trigger Core ML specialization without high peak memory. This doubles load time but reduces memory pressure.

Deferred Loading

// Download but don't load models yet
let whisperKit = try await WhisperKit(
    model: "base",
    load: false
)

// Load later when needed
try await whisperKit.loadModels()
See WhisperKitConfig.load

Unload Models

// Free memory when models aren't needed
await whisperKit.unloadModels()

// Reload when needed
try await whisperKit.loadModels()
See WhisperKit.unloadModels

Multilingual vs English-only

When to Use Multilingual Models

  • Transcribing content in multiple languages
  • Language is unknown in advance
  • Need automatic language detection
  • Translation to English (.translate task)
let whisperKit = try await WhisperKit(model: "base")  // Multilingual

let (language, _) = try await whisperKit.detectLanguage(
    audioPath: "audio.wav"
)
print("Detected: \(language)")

When to Use English-only Models

  • Only transcribing English audio
  • Slightly faster inference
  • Marginally better English accuracy
let whisperKit = try await WhisperKit(model: "base.en")

var options = DecodingOptions(language: "en")
let results = try await whisperKit.transcribe(
    audioPath: "audio.wav",
    decodeOptions: options
)

Model Performance Comparison

Performance varies by device. These are approximate values for reference.
ModelSizeParametersRelative SpeedMemoryAccuracy
tiny75 MB39M32x~150 MBGood
base140 MB74M16x~250 MBBetter
small460 MB244M6x~600 MBVery Good
medium1.5 GB769M2x~1.8 GBExcellent
large-v33 GB1550M1x~3.2 GBBest

Selection Guidelines

Real-time Streaming

Recommended: tiny, baseFast enough to transcribe live audio without lag on most devices.

Mobile Apps

Recommended: base, smallBalance of accuracy and app size. Consider on-demand download instead of bundling.

High Accuracy

Recommended: medium, large-v3Best for offline processing, server deployments, or high-end devices.

Constrained Devices

Recommended: tinyOnly option for devices with limited memory or older hardware.

Next Steps

Configuration

Configure compute options and advanced settings

Transcription

Start transcribing with your selected model

Build docs developers (and LLMs) love