MonoRelay exposes two audio endpoints that mirror the OpenAI Audio API. The transcriptions endpoint converts spoken audio into text in the original language. The translations endpoint transcribes audio and translates the result into English. Both endpoints accept multipart file uploads and route through MonoRelay’s standard provider resolution, so you can target any configured provider that supports audio processing. Supported audio formats and file size limits follow the upstream provider’s capabilities. For OpenAI’s Whisper, the accepted formats areDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/Excurs1ons/MonoRelay/llms.txt
Use this file to discover all available pages before exploring further.
flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm, with a maximum file size of 25 MB.
POST /v1/audio/transcriptions
Transcribe an audio file into text in its original language.Authentication
Request parameters
All parameters are submitted asmultipart/form-data fields.
The audio file to transcribe. Uploaded as a multipart file field. The accepted formats and size limits depend on the upstream provider.
The transcription model to use (e.g.
whisper-1). Accepts aliases and model@provider syntax.The language of the audio in ISO-639-1 format (e.g.
"en", "zh", "fr"). Providing this improves accuracy and speed. When omitted, the model attempts to auto-detect the language.Optional text to guide the model’s style or continue from a previous audio segment. The prompt should match the audio language.
Format of the transcript output. Accepted values:
"json", "text", "srt", "verbose_json", or "vtt". Support for each format depends on the upstream provider.Sampling temperature between
0 and 1. Higher values produce more varied output. Set to 0 for deterministic transcription.Example
response_format is "json":
POST /v1/audio/translations
Transcribe an audio file and translate the result into English, regardless of the original spoken language.Authentication
Request parameters
The audio file to transcribe and translate. Uploaded as a multipart file field.
The model to use for translation (e.g.
whisper-1).Example
The translations endpoint always outputs English text. If you need a transcript in the original language, use
/v1/audio/transcriptions instead.