WhisperKit provides flexible APIs for transcribing audio from files, arrays, or live input. All transcription methods return TranscriptionResult objects containing the recognized text and detailed metadata.
Customize transcription behavior with DecodingOptions:
var options = DecodingOptions( verbose: true, task: .transcribe, // or .translate for English translation language: "en", // Specify language or nil for auto-detect temperature: 0.0, // 0.0 for greedy, >0 for sampling wordTimestamps: true, // Enable word-level timestamps skipSpecialTokens: true // Remove special tokens from output)let results = try await whisperKit.transcribe( audioPath: "audio.wav", decodeOptions: options)
let audioPaths = [ "audio1.wav", "audio2.wav", "audio3.wav"]let results = await whisperKit.transcribe( audioPaths: audioPaths, decodeOptions: options)// Results is [[TranscriptionResult]?] - one array per filefor (index, result) in results.enumerated() { if let transcriptions = result { print("File \(index): \(transcriptions.first?.text ?? "")") }}
Each transcription returns a TranscriptionResult with:
let result: TranscriptionResult = results.first!// Full transcribed textprint(result.text)// Detected languageprint(result.language)// Individual segments with timestampsfor segment in result.segments { print("[\(segment.start)s - \(segment.end)s]: \(segment.text)") // Word-level timestamps (if enabled) if let words = segment.words { for word in words { print(" \(word.word): \(word.start)s - \(word.end)s") } }}// Performance timingsresult.logTimings()
Automatically retry with higher temperature on failure:
var options = DecodingOptions( temperature: 0.0, temperatureIncrementOnFallback: 0.2, // Increase by 0.2 each retry temperatureFallbackCount: 5 // Try up to 5 times)