POST /v1/audio/transcriptions
Transcribes an audio file into text. The request body must be sent asmultipart/form-data with the audio file included as a form field. This endpoint maps to the createTranscription operation and is compatible with the OpenAI Whisper API.
Request headers
The provider to route the request to (e.g.
openai). Required when not using a config.Your provider API key.
A JSON config object or config ID that defines routing, fallbacks, retries, and more.
A virtual key ID from Portkey Cloud.
Request body
This endpoint accepts
multipart/form-data. The audio file must be uploaded as a file field named file.The audio file to transcribe. Supported formats:
flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm. Maximum file size is 25 MB.The speech-to-text model to use (e.g.
whisper-1, gpt-4o-transcribe).The language of the audio, in ISO-639-1 format (e.g.
en, fr, de). Providing the language improves accuracy and latency.Optional text to guide the model’s style or provide context. The prompt should match the language of the audio.
The format of the transcription output. One of
json, text, srt, verbose_json, or vtt.Sampling temperature between
0 and 1. Higher values produce more varied output. Set to 0 for deterministic transcription.The granularity of timestamps to include. Requires
response_format to be verbose_json. Accepts word and/or segment.Response
The response format depends on theresponse_format parameter.
json (default)
The transcribed text.
verbose_json
The full transcribed text.
The detected language of the audio.
The duration of the audio file in seconds.
Segment-level transcription data, present when
timestamp_granularities includes segment.Word-level transcription data, present when
timestamp_granularities includes word.text, srt, vtt — Plain-text or subtitle format strings.