Quick Start
WhisperKit makes it easy to transcribe audio files on-device. This example shows how to get started with basic transcription.
Initialize WhisperKit
import WhisperKit
// Initialize WhisperKit with default settings
let pipe = try await WhisperKit ()
WhisperKit automatically downloads the recommended model for your device on first run.
Transcribe an Audio File
// Transcribe a local audio file
let transcription = try await pipe. transcribe ( audioPath : "path/to/audio.wav" ) ? . text
print (transcription)
Supported audio formats: .wav, .mp3, .m4a, .flac
Selecting a Model
Using a Specific Model
// Load a specific model
let pipe = try await WhisperKit ( WhisperKitConfig ( model : "large-v3" ))
Using Wildcards
// Use glob search to select a model
let pipe = try await WhisperKit ( WhisperKitConfig ( model : "distil*large-v3" ))
The model search must return a single model from the source repo, otherwise an error will be thrown.
Available Models
For a complete list of available models, see the HuggingFace repo .
Custom Model Repository
If you’ve created your own fine-tuned model using whisperkittools , you can load it by specifying your repo:
let config = WhisperKitConfig (
model : "large-v3" ,
modelRepo : "username/your-model-repo"
)
let pipe = try await WhisperKit (config)
Full Transcription Example
Here’s a complete example with error handling:
import WhisperKit
Task {
do {
// Initialize WhisperKit
let pipe = try await WhisperKit ()
// Transcribe audio file
guard let result = try await pipe. transcribe (
audioPath : "path/to/your/audio.wav"
) else {
print ( "Transcription returned nil" )
return
}
// Print the transcription
print ( "Transcription: \( result. text ) " )
// Access segments with timestamps
for segment in result.segments {
print ( "[ \( segment. start ) s - \( segment. end ) s]: \( segment. text ) " )
}
} catch {
print ( "Error: \( error ) " )
}
}
Command Line Usage
You can also use the WhisperKit CLI for quick testing:
# Install via Homebrew
brew install whisperkit-cli
# Transcribe an audio file
whisperkit-cli transcribe --audio-path audio.wav
Download Models First
If using the CLI from source:
# Clone the repository
git clone https://github.com/argmaxinc/whisperkit.git
cd whisperkit
# Setup environment
make setup
# Download a specific model
make download-model MODEL=large-v3
# Or download all models
make download-models
Make sure git-lfs is installed before running download-model.
Transcribe from Command Line
# Transcribe a file
swift run whisperkit-cli transcribe \
--model-path "Models/whisperkit-coreml/openai_whisper-large-v3" \
--audio-path "path/to/audio.wav"
Configuration Options
Model Compute Options
Optimize performance by selecting compute units:
let computeOptions = ModelComputeOptions (
audioEncoderCompute : . cpuAndNeuralEngine ,
textDecoderCompute : . cpuAndNeuralEngine
)
let config = WhisperKitConfig (
model : "large-v3" ,
computeOptions : computeOptions
)
let pipe = try await WhisperKit (config)
Decoding Options
Customize the transcription behavior:
var decodingOptions = DecodingOptions ()
decodingOptions. task = . transcribe
decodingOptions. language = "en"
decodingOptions. temperature = 0.0
decodingOptions. wordTimestamps = true
let result = try await pipe. transcribe (
audioPath : "audio.wav" ,
decodeOptions : decodingOptions
)
Next Steps
Real-Time Streaming Learn how to transcribe audio in real-time from a microphone
Local Server Set up a local transcription server with API clients