TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/Excurs1ons/MonoRelay/llms.txt
Use this file to discover all available pages before exploring further.
/v1/chat/completions endpoint is the primary interface for conversational AI in MonoRelay. It accepts a list of messages and returns a model-generated reply, routing the request to the appropriate upstream provider based on your model routing configuration. The endpoint is fully compatible with the OpenAI Chat Completions API, so any client or SDK that targets OpenAI will work without modification.
Method and path
Authentication
All requests must include a valid Bearer token in theAuthorization header. MonoRelay validates this against your configured access key or a JWT issued after login.
Request body
The model to use. Accepts a plain model name (e.g.
gpt-4o), a configured alias, or model@provider syntax to target a specific provider explicitly (e.g. gpt-4o@openai).An ordered list of messages representing the conversation history. Each object must have a
role (system, user, assistant, or tool) and a content field (string or content-part array for vision inputs).When
true, the response is sent as a series of Server-Sent Events (SSE) ending with data: [DONE]. Defaults to false for a single JSON response object.Sampling temperature between
0 and 2. Higher values produce more random output. Cannot be used together with top_p.Nucleus sampling probability mass. Only the tokens comprising the top
top_p probability are considered. Cannot be used together with temperature.Number of chat completion choices to generate for each message. Generating more than one choice increases token consumption.
Up to four sequences where the model will stop generating further tokens. The stop sequence itself is not included in the output.
The maximum number of tokens to generate. When omitted, the model’s default limit applies.
Number between
-2.0 and 2.0. Positive values penalize tokens that have already appeared, encouraging the model to discuss new topics.Number between
-2.0 and 2.0. Positive values penalize tokens proportional to how often they have appeared, reducing verbatim repetition.A list of tools the model may call. Each entry follows the OpenAI function definition schema with
type, function.name, function.description, and function.parameters.Controls which tool (if any) the model calls. Use
"none" to disable tools, "auto" to let the model decide, or {"type": "function", "function": {"name": "..."}} to force a specific function.An object specifying the output format. Set
{"type": "json_object"} to enable JSON mode and guarantee the response is valid JSON. Not all providers support this field.If specified, MonoRelay passes this seed to the upstream provider to encourage deterministic sampling. Identical seeds with identical parameters should produce the same output, though this is best-effort.
Streaming
Whenstream: true is set, MonoRelay returns an SSE stream. Each event is a data: line containing a JSON delta object in the standard OpenAI chunk format. The stream closes with a final data: [DONE] event.
To consume the stream with curl, use --no-buffer to disable output buffering:
Tool calling
MonoRelay forwardstools and tool_choice to the upstream provider unchanged. If the resolved model appears in the tool_calling.unsupported_models list in your configuration and auto_downgrade is enabled, MonoRelay automatically strips tool definitions from the request before forwarding, preventing upstream errors on models that do not support function calling.
Tool auto-downgrade is controlled by the
tool_calling.auto_downgrade setting in config.yml. When enabled, requests to unsupported models silently omit tool definitions.Examples
Error responses
Errors are returned as JSON with anerror object. The HTTP status code is 503 for upstream and provider failures, and 401 for authentication errors.
| Type | Description |
|---|---|
no_keys | No enabled API keys are available for the resolved provider. |
provider_disabled | The resolved provider is disabled in configuration. |
upstream_error | The upstream provider returned a non-2xx response. |
proxy_error | An internal network or serialization error occurred. |
cascade_error | All models in a cascade chain failed. |