MonoRelay accepts native Anthropic API format and exposes it atDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/Excurs1ons/MonoRelay/llms.txt
Use this file to discover all available pages before exploring further.
/v1/messages — you can point the official Anthropic SDK directly at MonoRelay without changing your request format. All request fields, streaming behavior, and beta headers are forwarded transparently to the configured Anthropic-compatible provider.
Endpoints
The Anthropic-compatible endpoints share the same/v1/ base path as OpenAI-compatible ones:
| Method | Path | Description |
|---|---|---|
POST | /v1/messages | Send a chat message in Anthropic format |
POST | /v1/messages/beta | Same as above, accepts anthropic-beta header |
GET | /v1/anthropic/models | List models from the Anthropic-compatible provider |
When configuring the Anthropic SDK with
base_url, use http://localhost:8787 (without /v1/). The SDK appends /v1/messages automatically.Authentication
Pass your MonoRelay access token in either of these headers:| Header | Format |
|---|---|
Authorization | Bearer <token> |
x-api-key | <token> |
access_key or a valid user JWT issued after login.
POST /v1/messages
Send a chat message and receive an Anthropic-format response. Supports both single-turn and multi-turn conversations, streaming, vision, tool use, and extended thinking (beta).GET /v1/anthropic/models
List models available through the configured Anthropic-compatible provider. Returns a model list in Anthropic response format.POST /v1/messages/beta
Identical to the standard messages endpoint but also accepts theanthropic-beta header for features such as extended thinking, prompt caching, and computer use. Set the header in your request and MonoRelay forwards it upstream unchanged.
Request body
The model identifier to use, e.g.
claude-opus-4-5 or a MonoRelay alias. MonoRelay resolves aliases and provider routing before forwarding.Maximum number of tokens to generate. Required by the Anthropic API; has no default.
Ordered list of conversation turns. Each element must have:
role—"user"or"assistant"content— a string or an array of content blocks (text, image, tool_result, etc.)
user turn.System prompt text. Accepts either a plain string or an array of content blocks for structured system prompts (useful with prompt caching).
Sampling temperature between
0.0 and 1.0. Controls output randomness. Omit to use the model’s default.Nucleus sampling threshold. Only tokens in the top
top_p probability mass are considered. Usually set either temperature or top_p, not both.Only sample from the top
top_k tokens at each step. Not available on all models.List of strings that stop generation when encountered. The model will not include the stop sequence in the response.
Set to
true to receive a Server-Sent Events stream instead of a single JSON response. Defaults to false.List of tool definitions the model may call. Each tool requires
name, description, and a JSON Schema input_schema.Controls tool selection behavior. Use
{"type": "auto"}, {"type": "any"}, or {"type": "tool", "name": "my_tool"} to force a specific tool.Response fields
Unique message identifier, e.g.
msg_01XFDUDYJgAACTJgxXiHvVEF.Always
"message" for non-streaming responses.Always
"assistant".Array of content blocks. A standard text response contains one block of
{"type": "text", "text": "..."}. Tool use responses include {"type": "tool_use", ...} blocks. Extended thinking responses include {"type": "thinking", "thinking": "..."} blocks.The model that generated the response as resolved by MonoRelay.
Why generation stopped:
"end_turn", "max_tokens", "stop_sequence", or "tool_use".Token consumption report:
input_tokens— prompt tokens consumedoutput_tokens— completion tokens generatedcache_read_input_tokens— tokens served from prompt cachecache_creation_input_tokens— tokens written to prompt cache
Python SDK example
Point the official Anthropic Python SDK at MonoRelay by settingbase_url. No other code changes are needed.
curl example
Streaming example
message_start → content_block_start → content_block_delta (repeated) → content_block_stop → message_delta → message_stop.