Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Excurs1ons/MonoRelay/llms.txt

Use this file to discover all available pages before exploring further.

MonoRelay exposes two audio endpoints that mirror the OpenAI Audio API. The transcriptions endpoint converts spoken audio into text in the original language. The translations endpoint transcribes audio and translates the result into English. Both endpoints accept multipart file uploads and route through MonoRelay’s standard provider resolution, so you can target any configured provider that supports audio processing. Supported audio formats and file size limits follow the upstream provider’s capabilities. For OpenAI’s Whisper, the accepted formats are flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm, with a maximum file size of 25 MB.

POST /v1/audio/transcriptions

Transcribe an audio file into text in its original language.
POST /v1/audio/transcriptions

Authentication

Authorization: Bearer <your-access-token>

Request parameters

All parameters are submitted as multipart/form-data fields.
file
file
required
The audio file to transcribe. Uploaded as a multipart file field. The accepted formats and size limits depend on the upstream provider.
model
string
required
The transcription model to use (e.g. whisper-1). Accepts aliases and model@provider syntax.
language
string
The language of the audio in ISO-639-1 format (e.g. "en", "zh", "fr"). Providing this improves accuracy and speed. When omitted, the model attempts to auto-detect the language.
prompt
string
Optional text to guide the model’s style or continue from a previous audio segment. The prompt should match the audio language.
response_format
string
default:"json"
Format of the transcript output. Accepted values: "json", "text", "srt", "verbose_json", or "vtt". Support for each format depends on the upstream provider.
temperature
number
default:"0"
Sampling temperature between 0 and 1. Higher values produce more varied output. Set to 0 for deterministic transcription.

Example

curl https://<host>/v1/audio/transcriptions \
  -H "Authorization: Bearer <token>" \
  -F "file=@meeting.mp3" \
  -F "model=whisper-1" \
  -F "language=en" \
  -F "response_format=json"
Response when response_format is "json":
{
  "text": "Welcome to the MonoRelay demo. Today we will cover the API endpoints."
}

POST /v1/audio/translations

Transcribe an audio file and translate the result into English, regardless of the original spoken language.
POST /v1/audio/translations

Authentication

Authorization: Bearer <your-access-token>

Request parameters

file
file
required
The audio file to transcribe and translate. Uploaded as a multipart file field.
model
string
required
The model to use for translation (e.g. whisper-1).

Example

curl https://<host>/v1/audio/translations \
  -H "Authorization: Bearer <token>" \
  -F "file=@german_lecture.mp3" \
  -F "model=whisper-1"
Response:
{
  "text": "Today we will talk about the history of the relay protocol."
}
The translations endpoint always outputs English text. If you need a transcript in the original language, use /v1/audio/transcriptions instead.

Build docs developers (and LLMs) love