Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Excurs1ons/MonoRelay/llms.txt

Use this file to discover all available pages before exploring further.

The /v1/chat/completions endpoint is the primary interface for conversational AI in MonoRelay. It accepts a list of messages and returns a model-generated reply, routing the request to the appropriate upstream provider based on your model routing configuration. The endpoint is fully compatible with the OpenAI Chat Completions API, so any client or SDK that targets OpenAI will work without modification.

Method and path

POST /v1/chat/completions

Authentication

All requests must include a valid Bearer token in the Authorization header. MonoRelay validates this against your configured access key or a JWT issued after login.
Authorization: Bearer <your-access-token>

Request body

model
string
required
The model to use. Accepts a plain model name (e.g. gpt-4o), a configured alias, or model@provider syntax to target a specific provider explicitly (e.g. gpt-4o@openai).
messages
object[]
required
An ordered list of messages representing the conversation history. Each object must have a role (system, user, assistant, or tool) and a content field (string or content-part array for vision inputs).
stream
boolean
default:"false"
When true, the response is sent as a series of Server-Sent Events (SSE) ending with data: [DONE]. Defaults to false for a single JSON response object.
temperature
number
Sampling temperature between 0 and 2. Higher values produce more random output. Cannot be used together with top_p.
top_p
number
Nucleus sampling probability mass. Only the tokens comprising the top top_p probability are considered. Cannot be used together with temperature.
n
number
default:"1"
Number of chat completion choices to generate for each message. Generating more than one choice increases token consumption.
stop
string | string[]
Up to four sequences where the model will stop generating further tokens. The stop sequence itself is not included in the output.
max_tokens
integer
The maximum number of tokens to generate. When omitted, the model’s default limit applies.
presence_penalty
number
Number between -2.0 and 2.0. Positive values penalize tokens that have already appeared, encouraging the model to discuss new topics.
frequency_penalty
number
Number between -2.0 and 2.0. Positive values penalize tokens proportional to how often they have appeared, reducing verbatim repetition.
tools
object[]
A list of tools the model may call. Each entry follows the OpenAI function definition schema with type, function.name, function.description, and function.parameters.
tool_choice
string | object
Controls which tool (if any) the model calls. Use "none" to disable tools, "auto" to let the model decide, or {"type": "function", "function": {"name": "..."}} to force a specific function.
response_format
object
An object specifying the output format. Set {"type": "json_object"} to enable JSON mode and guarantee the response is valid JSON. Not all providers support this field.
seed
integer
If specified, MonoRelay passes this seed to the upstream provider to encourage deterministic sampling. Identical seeds with identical parameters should produce the same output, though this is best-effort.

Streaming

When stream: true is set, MonoRelay returns an SSE stream. Each event is a data: line containing a JSON delta object in the standard OpenAI chunk format. The stream closes with a final data: [DONE] event. To consume the stream with curl, use --no-buffer to disable output buffering:
curl --no-buffer https://<host>/v1/chat/completions \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "stream": true,
    "messages": [{"role": "user", "content": "Count to 5."}]
  }'

Tool calling

MonoRelay forwards tools and tool_choice to the upstream provider unchanged. If the resolved model appears in the tool_calling.unsupported_models list in your configuration and auto_downgrade is enabled, MonoRelay automatically strips tool definitions from the request before forwarding, preventing upstream errors on models that do not support function calling.
Tool auto-downgrade is controlled by the tool_calling.auto_downgrade setting in config.yml. When enabled, requests to unsupported models silently omit tool definitions.

Examples

from openai import OpenAI

client = OpenAI(
    base_url="https://<host>/v1",
    api_key="<your-access-token>",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is MonoRelay?"},
    ],
)
print(response.choices[0].message.content)

Error responses

Errors are returned as JSON with an error object. The HTTP status code is 503 for upstream and provider failures, and 401 for authentication errors.
{
  "error": {
    "message": "[openai] No available keys for provider 'openai'",
    "type": "no_keys"
  }
}
Common error types:
TypeDescription
no_keysNo enabled API keys are available for the resolved provider.
provider_disabledThe resolved provider is disabled in configuration.
upstream_errorThe upstream provider returned a non-2xx response.
proxy_errorAn internal network or serialization error occurred.
cascade_errorAll models in a cascade chain failed.

Build docs developers (and LLMs) love