Skip to main content

Overview

GenerationOptions controls all aspects of the speech synthesis pipeline, including sampling parameters, chunking strategy, and concurrency. All fields have sensible defaults, so the zero-argument initializer works for most use cases.
public struct GenerationOptions: Codable, Sendable

Initialization

public init(
    temperature: Float = GenerationOptions.defaultTemperature,
    topK: Int = GenerationOptions.defaultTopK,
    repetitionPenalty: Float = GenerationOptions.defaultRepetitionPenalty,
    maxNewTokens: Int = GenerationOptions.defaultMaxNewTokens,
    concurrentWorkerCount: Int = 0,
    chunkingStrategy: TextChunkingStrategy? = nil,
    targetChunkSize: Int? = nil,
    minChunkSize: Int? = nil,
    instruction: String? = nil,
    forceLegacyEmbedPath: Bool = false
)
temperature
Float
default:"0.9"
Sampling temperature. Higher values (e.g., 1.0) make output more random; lower values (e.g., 0.5) make it more deterministic.
topK
Int
default:"50"
Top-K sampling parameter. Only the K most likely tokens are considered at each step.
repetitionPenalty
Float
default:"1.05"
Repetition penalty to discourage repeating tokens. Values > 1.0 penalize repetition.
maxNewTokens
Int
default:"245"
Maximum number of tokens to generate in the autoregressive loop.
concurrentWorkerCount
Int
default:"0"
Number of concurrent workers for multi-chunk generation:
  • 0: all chunks run concurrently in one batch (default, fastest for non-streaming use cases)
  • 1: sequential - one chunk at a time; required for real-time play streaming
  • N: at most N chunks run concurrently
chunkingStrategy
TextChunkingStrategy?
default:"nil"
How to split long text into chunks. Defaults to .sentence. Set to .none to force a single-pass generation without sentence splitting.
targetChunkSize
Int?
default:"nil"
Target chunk size in tokens for sentence chunking. nil resolves to TextChunker.defaultTargetChunkSize at the call site.
minChunkSize
Int?
default:"nil"
Minimum chunk size in tokens. nil resolves to TextChunker.defaultMinChunkSize at the call site.
instruction
String?
default:"nil"
Optional style instruction for controlling speech characteristics (e.g., "Very happy"). Prepended as a text-only user prompt before the main TTS segment. For Qwen3, this is only supported by the 1.7B model variant.
forceLegacyEmbedPath
Bool
default:"false"
Force the legacy [FloatType] inference path even on macOS 15+ / iOS 18+. When false (default), the MLTensor path is taken on supported OS versions. Set to true in tests to exercise the pre-macOS-15 code path on current hardware.

Properties

Sampling Parameters

temperature
Float
Sampling temperature. Default: 0.9
topK
Int
Top-K sampling parameter. Default: 50
repetitionPenalty
Float
Repetition penalty to discourage repeating tokens. Default: 1.05
maxNewTokens
Int
Maximum number of tokens to generate. Default: 245

Chunking and Concurrency

concurrentWorkerCount
Int
Number of concurrent workers for multi-chunk generation. Default: 0 (all chunks concurrently)
chunkingStrategy
TextChunkingStrategy?
How to split long text into chunks. Default: nil (resolves to .sentence)
targetChunkSize
Int?
Target chunk size in tokens for sentence chunking. Default: nil (uses TextChunker.defaultTargetChunkSize)
minChunkSize
Int?
Minimum chunk size in tokens. Default: nil (uses TextChunker.defaultMinChunkSize)

Style Control

instruction
String?
Optional style instruction for controlling speech characteristics. Default: nilOnly supported by the Qwen3 1.7B model variant.

Advanced

forceLegacyEmbedPath
Bool
Force the legacy [FloatType] inference path. Default: false

Static Properties

defaultTemperature

public static let defaultTemperature: Float = 0.9
value
Float
Default sampling temperature: 0.9

defaultTopK

public static let defaultTopK: Int = 50
value
Int
Default Top-K sampling parameter: 50

defaultRepetitionPenalty

public static let defaultRepetitionPenalty: Float = 1.05
value
Float
Default repetition penalty: 1.05

defaultMaxNewTokens

public static let defaultMaxNewTokens: Int = 245
value
Int
Default maximum number of tokens to generate: 245

Example Usage

Default Options

let result = try await tts.generate(
    text: "Hello, world!",
    voice: "ryan"
)

Custom Sampling

var options = GenerationOptions(
    temperature: 0.7,
    topK: 30,
    maxNewTokens: 500
)
let result = try await tts.generate(
    text: "A longer piece of text.",
    voice: "ryan",
    options: options
)

Sequential Generation (for Streaming)

var options = GenerationOptions(
    concurrentWorkerCount: 1  // Required for play() streaming
)
let result = try await tts.play(
    text: "This will stream audio chunk by chunk.",
    voice: "ryan",
    options: options,
    playbackStrategy: .auto
)

With Style Instruction (1.7B only)

var options = GenerationOptions(
    instruction: "Very happy and excited"
)
let result = try await tts.generate(
    text: "I'm so glad to meet you!",
    voice: "ryan",
    options: options
)

Disable Chunking

var options = GenerationOptions(
    chunkingStrategy: .none
)
let result = try await tts.generate(
    text: "Generate this as a single chunk.",
    voice: "ryan",
    options: options
)

Custom Chunk Sizes

var options = GenerationOptions(
    targetChunkSize: 100,
    minChunkSize: 20
)
let result = try await tts.generate(
    text: "A very long piece of text that will be split into chunks...",
    voice: "ryan",
    options: options
)

Parallel Generation

var options = GenerationOptions(
    concurrentWorkerCount: 4  // Up to 4 chunks in parallel
)
let result = try await tts.generate(
    text: "Long text with multiple sentences. Each sentence becomes a chunk. They generate in parallel.",
    voice: "ryan",
    options: options
)

Build docs developers (and LLMs) love