Documentation Index
Fetch the complete documentation index at: https://mintlify.com/vercel/ai/llms.txt
Use this file to discover all available pages before exploring further.
Speech
Speech generation is an experimental feature.
The AI SDK provides the generateSpeech
function to generate speech from text using a speech model.
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
});
To access the generated audio:
const audioData = result.audio.uint8Array; // audio data as Uint8Array
// or
const audioBase64 = result.audio.base64; // audio data as base64 string
Settings
Voice Selection
Different models support different voices. Refer to your provider’s documentation for available voices:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'nova', // Options: alloy, echo, fable, onyx, nova, shimmer
});
You can specify the desired output format for the audio:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
outputFormat: 'mp3', // Options: mp3, wav, opus, aac, flac, etc.
});
Speech Speed
Some models support adjusting the speed of the generated speech:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
speed: 1.25, // Speed multiplier (0.25 to 4.0)
});
Language Setting
You can specify the language for speech generation (provider support varies):
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { lmnt } from '@ai-sdk/lmnt';
const result = await generateSpeech({
model: lmnt.speech('aurora'),
text: 'Hola, mundo!',
language: 'es', // Spanish (ISO 639-1 language code)
});
Instructions
Some models accept additional instructions to guide the speech generation:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
instructions: 'Speak in a slow and steady tone',
});
Provider-Specific Settings
You can set model-specific settings with the providerOptions parameter:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
providerOptions: {
openai: {
// provider-specific options
},
},
});
Retries
The generateSpeech function accepts an optional maxRetries parameter
that you can use to set the maximum number of retries.
It defaults to 2 retries (3 attempts in total). You can set it to 0 to disable retries.
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
maxRetries: 0, // Disable retries
});
Abort Signals and Timeouts
generateSpeech accepts an optional abortSignal parameter of
type AbortSignal
that you can use to abort the speech generation process or set a timeout.
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
abortSignal: AbortSignal.timeout(10000), // Abort after 10 seconds
});
generateSpeech accepts an optional headers parameter of type Record<string, string>
that you can use to add custom headers to the speech generation request.
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
headers: { 'X-Custom-Header': 'custom-value' },
});
The generateSpeech function returns comprehensive response information:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
});
console.log(result.audio); // Generated audio file
console.log(result.warnings); // Any warnings from the provider
console.log(result.responses); // Raw provider responses
console.log(result.providerMetadata); // Provider-specific metadata
Speech Providers & Models
Several providers offer speech generation models:
| Provider | Model |
|---|
| OpenAI | tts-1 |
| OpenAI | tts-1-hd |
| ElevenLabs | Various |
| LMNT | aurora |