You can control how much audio is buffered before playback begins:
public enum PlaybackStrategy { case auto // Adaptive: measure first step, compute required buffer case stream // Immediate: start playing as soon as first frame arrives case buffered(TimeInterval) // Fixed: pre-buffer specified seconds case generateFirst // Generate all audio first, then play}
For streaming strategies (.auto, .stream, .buffered), chunking is forced to sequential (concurrentWorkerCount = 1) so frames can be enqueued in order. .generateFirst respects the caller’s concurrency setting.
The audio output applies automatic fade-in/fade-out at discontinuities:
Fade-in: Applied at the start of playback, start of a chunk, or after an underrun
Fade-out: Applied at the end of playback, end of a chunk, or before an underrun
Interior frames of contiguous playback are untouched. Fades are 256 samples (~10.7ms at 24kHz) - imperceptible on continuous audio but eliminate clicks at gaps.
Here’s a complete example with progress updates and playback control:
import TTSKitTask { let tts = try await TTSKit() let startTime = Date() let result = try await tts.play( text: "The quick brown fox jumps over the lazy dog. This sentence demonstrates real-time streaming playback with adaptive buffering.", speaker: .ryan, language: .english, playbackStrategy: .auto ) { progress in if let stepTime = progress.stepTime { let buffer = tts.audioOutput.silentBufferRemaining print("First step: \(Int(stepTime * 1000))ms, buffering \(String(format: "%.2f", buffer))s") } let elapsed = Date().timeIntervalSince(startTime) let position = tts.audioOutput.currentPlaybackTime print("Elapsed: \(String(format: "%.2f", elapsed))s, Playing: \(String(format: "%.2f", position))s") return true } print("Done! Generated \(result.audioDuration)s of audio") print("Time to first buffer: \(result.timings.timeToFirstBuffer)s") print("Full pipeline: \(result.timings.fullPipeline)s")}