Audio translations

POST /v1/audio/translations

Translates audio from a non-English language into English text. The request body must be sent as multipart/form-data with the audio file included as a file field. This endpoint maps to the createTranslation operation and is compatible with the OpenAI Whisper translation API.

This endpoint always produces English output. To transcribe audio without translation, use audio transcriptions instead.

Request headers

x-portkey-provider

string

The provider to route the request to (e.g. openai). Required when not using a config.

x-portkey-api-key

string

Your provider API key.

x-portkey-config

string

A JSON config object or config ID that defines routing, fallbacks, retries, and more.

x-portkey-virtual-key

string

A virtual key ID from Portkey Cloud.

Request body

This endpoint accepts multipart/form-data. The audio file must be uploaded as a file field named file.

file

required

The audio file to translate. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm. Maximum file size is 25 MB.

model

string

required

The speech-to-text model to use (e.g. whisper-1).

prompt

string

Optional English text to guide the model’s output style or provide context. The prompt should be in English even if the source audio is in another language.

response_format

string

default:"json"

The format of the translation output. One of json, text, srt, verbose_json, or vtt.

temperature

number

default:"0"

Sampling temperature between 0 and 1. Higher values produce more varied output.

Response

The response format depends on the response_format parameter. json (default)

text

string

The translated English text.

verbose_json

text

string

The full translated text.

language

string

The detected language of the source audio.

duration

number

The duration of the audio file in seconds.

segments

object[]

Segment-level data for the translation.

Show properties

integer

Segment index.

start

number

Start time of the segment in seconds.

end

number

End time of the segment in seconds.

text

string

The translated text for this segment.

text, srt, vtt — Plain-text or subtitle format strings.

Code examples

curl http://localhost:8787/v1/audio/translations \
  -H "x-portkey-provider: openai" \
  -H "x-portkey-api-key: $OPENAI_API_KEY" \
  -F "model=whisper-1" \
  -F "file=@german_speech.mp3" \
  -F "response_format=json"

Overview

Chat

Multimodal

Files & Batches

Other

POST /v1/audio/translations

Request headers

Request body

Response

Code examples

Build docs developers (and LLMs) love

Overview

Chat

Multimodal

Files & Batches

Other

​POST /v1/audio/translations

​Request headers

​Request body

​Response

​Code examples

Build docs developers (and LLMs) love

POST /v1/audio/translations

Request headers

Request body

Response

Code examples