Overview
AudioOutput handles audio export to file and real-time streaming playback via AVAudioEngine. It supports adaptive pre-buffering and edge-fading to prevent audible clicks during playback.
Key Features
- Pre-buffering: Accumulates audio frames until a threshold is reached before flushing to the player, preventing underruns on slower devices
- Edge-fading: Applies fade-in/fade-out only at actual audio discontinuities (session start/end, chunk boundaries, underruns)
- Underrun detection: Uses wall-clock timing to detect when the player has drained and needs fade-in on the next frame
- File export: Supports M4A and WAV formats with optional metadata embedding
Initialization
Output sample rate in Hz. Defaults to 24000 (Qwen3 TTS).
Properties
Output sample rate in Hz. Read-only. Updated by
TTSKit.loadModels() to match the loaded speech decoder’s actual sample rate.The audio format used for playback and export (derived from
sampleRate). Read-only.Cumulative duration (seconds) of real audio that has been scheduled via
scheduleWithFades. Read-only.The silent sentinel buffer used for drain detection is not included.Current playback position in seconds, based on the audio engine’s render timeline. Read-only.Returns 0 if the player is not active, no audio has been scheduled yet, or the player hasn’t started rendering.Clamped to
scheduledAudioDuration so the position never advances into silence gaps between chunks or past the last real audio frame.How many seconds of audio still need to accumulate in the pre-buffer before the next chunk flushes and playback resumes. Read-only.Non-zero only while in buffering mode (
bufferThresholdMet == false and a positive bufferDuration is set).Static Properties
Number of samples for the fade-in/fade-out ramp. Value:
256256 samples at 24kHz ≈ 10.7ms - imperceptible on contiguous audio but smoothly eliminates clicks at discontinuities.Instance Methods
Configuration
configure(sampleRate:)
Update the sample rate to match the loaded speech decoder.The new sample rate in Hz.
startPlayback().
Playback Control
startPlayback(deferEngineStart:)
Start the audio engine for streaming playback.When
true, the audio engine is created and connected but not started. The engine will start automatically on the first enqueueAudioChunk call. This avoids the render thread contending with model predictions during the critical time-to-first-buffer path.setBufferDuration(_:).
Throws: TTSError if the audio engine fails to start.
setBufferDuration(_:)
Configure the pre-buffer duration.Duration of audio to accumulate before flushing. Pass 0 for immediate streaming (fast devices).
startPlayback().
- If
seconds == 0: immediately flushes any pending frames and switches to direct streaming (no buffering) - If
seconds > 0: sets the threshold. If enough audio has already accumulated, flushes immediately - Can be called multiple times (e.g., per-chunk reassessment). Any held tail frame from the previous chunk is committed with fade-out first
enqueueAudioChunk(_:)
Enqueue a chunk of audio samples for playback.Mono Float32 PCM samples to enqueue.
stopPlayback(waitForCompletion:)
Stop playback and tear down the audio engine.When
true, waits for any remaining scheduled buffers to finish playing before tearing down the engine.File Export
saveAudio(_:toFolder:filename:sampleRate:format:metadataProvider:)
Save audio samples to a file.Mono Float32 PCM samples.
Destination directory. Created if it doesn’t exist.
File name, with or without extension. Any extension already present in
filename is stripped before writing.Sample rate in Hz.
Output format. Inferred from
filename extension when nil. Defaults to .m4a if no extension found.Optional metadata callback for items to embed into the file container for m4a formats.
The URL of the written file.
AVAssetExportSession passthrough to remux with embedded metadata atoms (no re-encode). For WAV or metadata-free M4A: writes directly.
On watchOS, .m4a automatically falls back to .wav.
Throws: TTSError if audio encoding or export fails.
Example:
duration(of:)
Return the playback duration of an audio file in seconds.URL to the audio file.
Duration in seconds.
Crossfade Assembly
crossfade(_:fadeLength:)
Assemble multiple audio chunks into one array with equal-power crossfades at each boundary.Ordered audio chunks to concatenate.
Number of overlap samples for each crossfade.
Single concatenated audio array with crossfades applied at chunk boundaries.
cos(t*pi/2) fade-out and sin(t*pi/2) fade-in so that energy is preserved through the overlap region. Fade curves are pre-computed once via Accelerate (vDSP_vramp + vvcosf/vvsinf) and reused at every chunk boundary.
Example:
AudioFileFormat
Supported audio export formats.Properties
The file extension for this format (e.g., “m4a”, “wav”).
Static Methods
resolve(_:)
Resolve the effective format for the current platform.Preferred format.
The resolved format. On watchOS, M4A is not supported so falls back to WAV with a warning.
Buffer Lifecycle
The buffer lifecycle for streaming playback follows these steps:startPlayback()- resets all state; frames accumulate until configuredsetBufferDuration(_:)- configures threshold (call after start)enqueueAudioChunk(_:)- pushes frames through the buffer/tail pipelinestopPlayback()- commits the tail with fade-out, waits, tears down