Skip to main content

Quick Start

WhisperKit makes it easy to transcribe audio files on-device. This example shows how to get started with basic transcription.

Initialize WhisperKit

import WhisperKit

// Initialize WhisperKit with default settings
let pipe = try await WhisperKit()
WhisperKit automatically downloads the recommended model for your device on first run.

Transcribe an Audio File

// Transcribe a local audio file
let transcription = try await pipe.transcribe(audioPath: "path/to/audio.wav")?.text
print(transcription)
Supported audio formats: .wav, .mp3, .m4a, .flac

Selecting a Model

Using a Specific Model

// Load a specific model
let pipe = try await WhisperKit(WhisperKitConfig(model: "large-v3"))

Using Wildcards

// Use glob search to select a model
let pipe = try await WhisperKit(WhisperKitConfig(model: "distil*large-v3"))
The model search must return a single model from the source repo, otherwise an error will be thrown.

Available Models

For a complete list of available models, see the HuggingFace repo.

Custom Model Repository

If you’ve created your own fine-tuned model using whisperkittools, you can load it by specifying your repo:
let config = WhisperKitConfig(
    model: "large-v3",
    modelRepo: "username/your-model-repo"
)
let pipe = try await WhisperKit(config)

Full Transcription Example

Here’s a complete example with error handling:
import WhisperKit

Task {
    do {
        // Initialize WhisperKit
        let pipe = try await WhisperKit()
        
        // Transcribe audio file
        guard let result = try await pipe.transcribe(
            audioPath: "path/to/your/audio.wav"
        ) else {
            print("Transcription returned nil")
            return
        }
        
        // Print the transcription
        print("Transcription: \(result.text)")
        
        // Access segments with timestamps
        for segment in result.segments {
            print("[\(segment.start)s - \(segment.end)s]: \(segment.text)")
        }
        
    } catch {
        print("Error: \(error)")
    }
}

Command Line Usage

You can also use the WhisperKit CLI for quick testing:
# Install via Homebrew
brew install whisperkit-cli

# Transcribe an audio file
whisperkit-cli transcribe --audio-path audio.wav

Download Models First

If using the CLI from source:
# Clone the repository
git clone https://github.com/argmaxinc/whisperkit.git
cd whisperkit

# Setup environment
make setup

# Download a specific model
make download-model MODEL=large-v3

# Or download all models
make download-models
Make sure git-lfs is installed before running download-model.

Transcribe from Command Line

# Transcribe a file
swift run whisperkit-cli transcribe \
    --model-path "Models/whisperkit-coreml/openai_whisper-large-v3" \
    --audio-path "path/to/audio.wav"

Configuration Options

Model Compute Options

Optimize performance by selecting compute units:
let computeOptions = ModelComputeOptions(
    audioEncoderCompute: .cpuAndNeuralEngine,
    textDecoderCompute: .cpuAndNeuralEngine
)

let config = WhisperKitConfig(
    model: "large-v3",
    computeOptions: computeOptions
)

let pipe = try await WhisperKit(config)

Decoding Options

Customize the transcription behavior:
var decodingOptions = DecodingOptions()
decodingOptions.task = .transcribe
decodingOptions.language = "en"
decodingOptions.temperature = 0.0
decodingOptions.wordTimestamps = true

let result = try await pipe.transcribe(
    audioPath: "audio.wav",
    decodeOptions: decodingOptions
)

Next Steps

Real-Time Streaming

Learn how to transcribe audio in real-time from a microphone

Local Server

Set up a local transcription server with API clients

Build docs developers (and LLMs) love