Basic Transcription

Quick Start

WhisperKit makes it easy to transcribe audio files on-device. This example shows how to get started with basic transcription.

Initialize WhisperKit

import WhisperKit

// Initialize WhisperKit with default settings
let pipe = try await WhisperKit()

WhisperKit automatically downloads the recommended model for your device on first run.

Transcribe an Audio File

// Transcribe a local audio file
let transcription = try await pipe.transcribe(audioPath: "path/to/audio.wav")?.text
print(transcription)

Supported audio formats: .wav, .mp3, .m4a, .flac

Selecting a Model

Using a Specific Model

// Load a specific model
let pipe = try await WhisperKit(WhisperKitConfig(model: "large-v3"))

Using Wildcards

// Use glob search to select a model
let pipe = try await WhisperKit(WhisperKitConfig(model: "distil*large-v3"))

The model search must return a single model from the source repo, otherwise an error will be thrown.

Available Models

For a complete list of available models, see the HuggingFace repo.

Custom Model Repository

If you’ve created your own fine-tuned model using whisperkittools, you can load it by specifying your repo:

let config = WhisperKitConfig(
    model: "large-v3",
    modelRepo: "username/your-model-repo"
)
let pipe = try await WhisperKit(config)

Full Transcription Example

Here’s a complete example with error handling:

import WhisperKit

Task {
    do {
        // Initialize WhisperKit
        let pipe = try await WhisperKit()
        
        // Transcribe audio file
        guard let result = try await pipe.transcribe(
            audioPath: "path/to/your/audio.wav"
        ) else {
            print("Transcription returned nil")
            return
        }
        
        // Print the transcription
        print("Transcription: \(result.text)")
        
        // Access segments with timestamps
        for segment in result.segments {
            print("[\(segment.start)s - \(segment.end)s]: \(segment.text)")
        }
        
    } catch {
        print("Error: \(error)")
    }
}

Command Line Usage

You can also use the WhisperKit CLI for quick testing:

# Install via Homebrew
brew install whisperkit-cli

# Transcribe an audio file
whisperkit-cli transcribe --audio-path audio.wav

Download Models First

If using the CLI from source:

# Clone the repository
git clone https://github.com/argmaxinc/whisperkit.git
cd whisperkit

# Setup environment
make setup

# Download a specific model
make download-model MODEL=large-v3

# Or download all models
make download-models

Make sure git-lfs is installed before running download-model.

Transcribe from Command Line

# Transcribe a file
swift run whisperkit-cli transcribe \
    --model-path "Models/whisperkit-coreml/openai_whisper-large-v3" \
    --audio-path "path/to/audio.wav"

Configuration Options

Model Compute Options

Optimize performance by selecting compute units:

let computeOptions = ModelComputeOptions(
    audioEncoderCompute: .cpuAndNeuralEngine,
    textDecoderCompute: .cpuAndNeuralEngine
)

let config = WhisperKitConfig(
    model: "large-v3",
    computeOptions: computeOptions
)

let pipe = try await WhisperKit(config)

Decoding Options

Customize the transcription behavior:

var decodingOptions = DecodingOptions()
decodingOptions.task = .transcribe
decodingOptions.language = "en"
decodingOptions.temperature = 0.0
decodingOptions.wordTimestamps = true

let result = try await pipe.transcribe(
    audioPath: "audio.wav",
    decodeOptions: decodingOptions
)

Get Started

WhisperKit (Speech-to-Text)

TTSKit (Text-to-Speech)

Advanced

Examples

Quick Start

Initialize WhisperKit

Transcribe an Audio File

Selecting a Model

Using a Specific Model

Using Wildcards

Available Models

Custom Model Repository

Full Transcription Example

Command Line Usage

Download Models First

Transcribe from Command Line

Configuration Options

Model Compute Options

Decoding Options

Next Steps

Real-Time Streaming

Local Server

Build docs developers (and LLMs) love

Get Started

WhisperKit (Speech-to-Text)

TTSKit (Text-to-Speech)

Advanced

Examples

Documentation Index

​Quick Start

​Initialize WhisperKit

​Transcribe an Audio File

​Selecting a Model

​Using a Specific Model

​Using Wildcards

​Available Models

​Custom Model Repository

​Full Transcription Example

​Command Line Usage

​Download Models First

​Transcribe from Command Line

​Configuration Options

​Model Compute Options

​Decoding Options

​Next Steps

Real-Time Streaming

Local Server

Build docs developers (and LLMs) love

Quick Start

Initialize WhisperKit

Transcribe an Audio File

Selecting a Model

Using a Specific Model

Using Wildcards

Available Models

Custom Model Repository

Full Transcription Example

Command Line Usage

Download Models First

Transcribe from Command Line

Configuration Options

Model Compute Options

Decoding Options

Next Steps