Skip to main content

POST /v1/audio/speech

Converts text into natural-sounding speech. The response is a binary audio stream in the requested format. This endpoint maps to the createSpeech operation and is compatible with the OpenAI TTS API.

Request headers

x-portkey-provider
string
The provider to route the request to (e.g. openai). Required when not using a config.
x-portkey-api-key
string
Your provider API key.
x-portkey-config
string
A JSON config object or config ID that defines routing, fallbacks, retries, and more.
x-portkey-virtual-key
string
A virtual key ID from Portkey Cloud.

Request body

model
string
required
The TTS model to use. OpenAI supports tts-1 (optimized for speed) and tts-1-hd (optimized for quality).
input
string
required
The text to convert to speech. Maximum 4096 characters.
voice
string
required
The voice to use. OpenAI provides alloy, ash, coral, echo, fable, onyx, nova, and shimmer. Check your provider’s documentation for available voices.
response_format
string
default:"mp3"
The audio output format. Supported values: mp3, opus, aac, flac, wav, pcm.
speed
number
default:"1.0"
The speed of the generated speech, between 0.25 and 4.0. Values above 1.0 speed up the audio; values below slow it down.

Response

The response body is a binary audio stream with the Content-Type matching the requested response_format (e.g. audio/mpeg for MP3).
FormatContent-Type
mp3audio/mpeg
opusaudio/opus
aacaudio/aac
flacaudio/flac
wavaudio/wav
pcmaudio/pcm

Code examples

curl http://localhost:8787/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "x-portkey-provider: openai" \
  -H "x-portkey-api-key: $OPENAI_API_KEY" \
  -d '{
    "model": "tts-1",
    "input": "The Portkey AI Gateway routes requests to over 250 language models.",
    "voice": "alloy"
  }' \
  --output speech.mp3

Build docs developers (and LLMs) love