Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Excurs1ons/MonoRelay/llms.txt

Use this file to discover all available pages before exploring further.

MonoRelay accepts native Anthropic API format and exposes it at /v1/messages — you can point the official Anthropic SDK directly at MonoRelay without changing your request format. All request fields, streaming behavior, and beta headers are forwarded transparently to the configured Anthropic-compatible provider.

Endpoints

The Anthropic-compatible endpoints share the same /v1/ base path as OpenAI-compatible ones:
MethodPathDescription
POST/v1/messagesSend a chat message in Anthropic format
POST/v1/messages/betaSame as above, accepts anthropic-beta header
GET/v1/anthropic/modelsList models from the Anthropic-compatible provider
When configuring the Anthropic SDK with base_url, use http://localhost:8787 (without /v1/). The SDK appends /v1/messages automatically.

Authentication

Pass your MonoRelay access token in either of these headers:
HeaderFormat
AuthorizationBearer <token>
x-api-key<token>
The token is verified against the MonoRelay server’s configured access_key or a valid user JWT issued after login.

POST /v1/messages

Send a chat message and receive an Anthropic-format response. Supports both single-turn and multi-turn conversations, streaming, vision, tool use, and extended thinking (beta).

GET /v1/anthropic/models

List models available through the configured Anthropic-compatible provider. Returns a model list in Anthropic response format.

POST /v1/messages/beta

Identical to the standard messages endpoint but also accepts the anthropic-beta header for features such as extended thinking, prompt caching, and computer use. Set the header in your request and MonoRelay forwards it upstream unchanged.
anthropic-beta: interleaved-thinking-2025-05-14

Request body

model
string
required
The model identifier to use, e.g. claude-opus-4-5 or a MonoRelay alias. MonoRelay resolves aliases and provider routing before forwarding.
max_tokens
integer
required
Maximum number of tokens to generate. Required by the Anthropic API; has no default.
messages
array
required
Ordered list of conversation turns. Each element must have:
  • role"user" or "assistant"
  • content — a string or an array of content blocks (text, image, tool_result, etc.)
The array must alternate roles and must begin with a user turn.
system
string or array
System prompt text. Accepts either a plain string or an array of content blocks for structured system prompts (useful with prompt caching).
temperature
number
Sampling temperature between 0.0 and 1.0. Controls output randomness. Omit to use the model’s default.
top_p
number
Nucleus sampling threshold. Only tokens in the top top_p probability mass are considered. Usually set either temperature or top_p, not both.
top_k
integer
Only sample from the top top_k tokens at each step. Not available on all models.
stop_sequences
array
List of strings that stop generation when encountered. The model will not include the stop sequence in the response.
stream
boolean
Set to true to receive a Server-Sent Events stream instead of a single JSON response. Defaults to false.
tools
array
List of tool definitions the model may call. Each tool requires name, description, and a JSON Schema input_schema.
tool_choice
object
Controls tool selection behavior. Use {"type": "auto"}, {"type": "any"}, or {"type": "tool", "name": "my_tool"} to force a specific tool.

Response fields

id
string
Unique message identifier, e.g. msg_01XFDUDYJgAACTJgxXiHvVEF.
type
string
Always "message" for non-streaming responses.
role
string
Always "assistant".
content
array
Array of content blocks. A standard text response contains one block of {"type": "text", "text": "..."}. Tool use responses include {"type": "tool_use", ...} blocks. Extended thinking responses include {"type": "thinking", "thinking": "..."} blocks.
model
string
The model that generated the response as resolved by MonoRelay.
stop_reason
string
Why generation stopped: "end_turn", "max_tokens", "stop_sequence", or "tool_use".
usage
object
Token consumption report:
  • input_tokens — prompt tokens consumed
  • output_tokens — completion tokens generated
  • cache_read_input_tokens — tokens served from prompt cache
  • cache_creation_input_tokens — tokens written to prompt cache

Python SDK example

Point the official Anthropic Python SDK at MonoRelay by setting base_url. No other code changes are needed.
import anthropic

client = anthropic.Anthropic(
    api_key="YOUR_MONORELAY_TOKEN",
    base_url="http://localhost:8787",
)

message = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain prompt caching in one paragraph."}
    ],
)

print(message.content[0].text)

curl example

curl http://localhost:8787/v1/messages \
  -H "x-api-key: YOUR_MONORELAY_TOKEN" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-opus-4-5",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Streaming example

curl http://localhost:8787/v1/messages \
  -H "x-api-key: YOUR_MONORELAY_TOKEN" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  --no-buffer \
  -d '{
    "model": "claude-opus-4-5",
    "max_tokens": 512,
    "stream": true,
    "messages": [
      {"role": "user", "content": "Count slowly from 1 to 5."}
    ]
  }'
Streaming responses arrive as Server-Sent Events. The sequence of event types is: message_startcontent_block_startcontent_block_delta (repeated) → content_block_stopmessage_deltamessage_stop.

Build docs developers (and LLMs) love