POST /v1/audio/speech
Converts text into natural-sounding speech. The response is a binary audio stream in the requested format. This endpoint maps to thecreateSpeech operation and is compatible with the OpenAI TTS API.
Request headers
The provider to route the request to (e.g.
openai). Required when not using a config.Your provider API key.
A JSON config object or config ID that defines routing, fallbacks, retries, and more.
A virtual key ID from Portkey Cloud.
Request body
The TTS model to use. OpenAI supports
tts-1 (optimized for speed) and tts-1-hd (optimized for quality).The text to convert to speech. Maximum 4096 characters.
The voice to use. OpenAI provides
alloy, ash, coral, echo, fable, onyx, nova, and shimmer. Check your provider’s documentation for available voices.The audio output format. Supported values:
mp3, opus, aac, flac, wav, pcm.The speed of the generated speech, between
0.25 and 4.0. Values above 1.0 speed up the audio; values below slow it down.Response
The response body is a binary audio stream with theContent-Type matching the requested response_format (e.g. audio/mpeg for MP3).
| Format | Content-Type |
|---|---|
mp3 | audio/mpeg |
opus | audio/opus |
aac | audio/aac |
flac | audio/flac |
wav | audio/wav |
pcm | audio/pcm |