Audio transcriptions

POST /v1/audio/transcriptions

Transcribes an audio file into text. The request body must be sent as multipart/form-data with the audio file included as a form field. This endpoint maps to the createTranscription operation and is compatible with the OpenAI Whisper API.

Request headers

x-portkey-provider

string

The provider to route the request to (e.g. openai). Required when not using a config.

x-portkey-api-key

string

Your provider API key.

x-portkey-config

string

A JSON config object or config ID that defines routing, fallbacks, retries, and more.

x-portkey-virtual-key

string

A virtual key ID from Portkey Cloud.

Request body

This endpoint accepts multipart/form-data. The audio file must be uploaded as a file field named file.

file

required

The audio file to transcribe. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm. Maximum file size is 25 MB.

model

string

required

The speech-to-text model to use (e.g. whisper-1, gpt-4o-transcribe).

language

string

The language of the audio, in ISO-639-1 format (e.g. en, fr, de). Providing the language improves accuracy and latency.

prompt

string

Optional text to guide the model’s style or provide context. The prompt should match the language of the audio.

response_format

string

default:"json"

The format of the transcription output. One of json, text, srt, verbose_json, or vtt.

temperature

number

default:"0"

Sampling temperature between 0 and 1. Higher values produce more varied output. Set to 0 for deterministic transcription.

timestamp_granularities

string[]

The granularity of timestamps to include. Requires response_format to be verbose_json. Accepts word and/or segment.

Response

The response format depends on the response_format parameter. json (default)

text

string

The transcribed text.

verbose_json

text

string

The full transcribed text.

language

string

The detected language of the audio.

duration

number

The duration of the audio file in seconds.

segments

object[]

Segment-level transcription data, present when timestamp_granularities includes segment.

Show properties

integer

Segment index.

start

number

Start time of the segment in seconds.

end

number

End time of the segment in seconds.

text

string

The transcribed text for this segment.

words

object[]

Word-level transcription data, present when timestamp_granularities includes word.

Show properties

word

string

The transcribed word.

start

number

Start time in seconds.

end

number

End time in seconds.

text, srt, vtt — Plain-text or subtitle format strings.

Code examples

curl http://localhost:8787/v1/audio/transcriptions \
  -H "x-portkey-provider: openai" \
  -H "x-portkey-api-key: $OPENAI_API_KEY" \
  -F "model=whisper-1" \
  -F "[email protected]" \
  -F "language=en" \
  -F "response_format=json"

Overview

Chat

Multimodal

Files & Batches

Other

POST /v1/audio/transcriptions

Request headers

Request body

Response

Code examples

Build docs developers (and LLMs) love

Overview

Chat

Multimodal

Files & Batches

Other

​POST /v1/audio/transcriptions

​Request headers

​Request body

​Response

​Code examples

Build docs developers (and LLMs) love

POST /v1/audio/transcriptions

Request headers

Request body

Response

Code examples