POST /v1/audio — transcription and translation endpoints

MonoRelay exposes two audio endpoints that mirror the OpenAI Audio API. The transcriptions endpoint converts spoken audio into text in the original language. The translations endpoint transcribes audio and translates the result into English. Both endpoints accept multipart file uploads and route through MonoRelay’s standard provider resolution, so you can target any configured provider that supports audio processing. Supported audio formats and file size limits follow the upstream provider’s capabilities. For OpenAI’s Whisper, the accepted formats are flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm, with a maximum file size of 25 MB.

POST /v1/audio/transcriptions

Transcribe an audio file into text in its original language.

POST /v1/audio/transcriptions

Authentication

Authorization: Bearer <your-access-token>

Request parameters

All parameters are submitted as multipart/form-data fields.

file

required

The audio file to transcribe. Uploaded as a multipart file field. The accepted formats and size limits depend on the upstream provider.

model

string

required

The transcription model to use (e.g. whisper-1). Accepts aliases and model@provider syntax.

language

string

The language of the audio in ISO-639-1 format (e.g. "en", "zh", "fr"). Providing this improves accuracy and speed. When omitted, the model attempts to auto-detect the language.

prompt

string

Optional text to guide the model’s style or continue from a previous audio segment. The prompt should match the audio language.

response_format

string

default:"json"

Format of the transcript output. Accepted values: "json", "text", "srt", "verbose_json", or "vtt". Support for each format depends on the upstream provider.

temperature

number

default:"0"

Sampling temperature between 0 and 1. Higher values produce more varied output. Set to 0 for deterministic transcription.

Example

curl https://<host>/v1/audio/transcriptions \
  -H "Authorization: Bearer <token>" \
  -F "file=@meeting.mp3" \
  -F "model=whisper-1" \
  -F "language=en" \
  -F "response_format=json"

Response when response_format is "json":

{
  "text": "Welcome to the MonoRelay demo. Today we will cover the API endpoints."
}

POST /v1/audio/translations

Transcribe an audio file and translate the result into English, regardless of the original spoken language.

POST /v1/audio/translations

Authentication

Authorization: Bearer <your-access-token>

Request parameters

file

required

The audio file to transcribe and translate. Uploaded as a multipart file field.

model

string

required

The model to use for translation (e.g. whisper-1).

Example

curl https://<host>/v1/audio/translations \
  -H "Authorization: Bearer <token>" \
  -F "file=@german_lecture.mp3" \
  -F "model=whisper-1"

Response:

{
  "text": "Today we will talk about the history of the relay protocol."
}

The translations endpoint always outputs English text. If you need a transcript in the original language, use /v1/audio/transcriptions instead.

Overview

OpenAI-Compatible

Anthropic-Compatible

Management API

POST /v1/audio — transcription and translation endpoints

POST /v1/audio/transcriptions

Authentication

Request parameters

Example

POST /v1/audio/translations

Authentication

Request parameters

Example

Build docs developers (and LLMs) love

Overview

OpenAI-Compatible

Anthropic-Compatible

Management API

Documentation Index

​POST /v1/audio/transcriptions

​Authentication

​Request parameters

​Example

​POST /v1/audio/translations

​Authentication

​Request parameters

​Example

Build docs developers (and LLMs) love

POST /v1/audio/transcriptions

Authentication

Request parameters

Example

POST /v1/audio/translations

Authentication

Request parameters

Example