POST /v1/messages

MonoRelay accepts native Anthropic API format and exposes it at /v1/messages — you can point the official Anthropic SDK directly at MonoRelay without changing your request format. All request fields, streaming behavior, and beta headers are forwarded transparently to the configured Anthropic-compatible provider.

Endpoints

The Anthropic-compatible endpoints share the same /v1/ base path as OpenAI-compatible ones:

Method	Path	Description
`POST`	`/v1/messages`	Send a chat message in Anthropic format
`POST`	`/v1/messages/beta`	Same as above, accepts `anthropic-beta` header
`GET`	`/v1/anthropic/models`	List models from the Anthropic-compatible provider

When configuring the Anthropic SDK with base_url, use http://localhost:8787 (without /v1/). The SDK appends /v1/messages automatically.

Authentication

Pass your MonoRelay access token in either of these headers:

Header	Format
`Authorization`	`Bearer <token>`
`x-api-key`	`<token>`

The token is verified against the MonoRelay server’s configured access_key or a valid user JWT issued after login.

Send a chat message and receive an Anthropic-format response. Supports both single-turn and multi-turn conversations, streaming, vision, tool use, and extended thinking (beta).

GET /v1/anthropic/models

List models available through the configured Anthropic-compatible provider. Returns a model list in Anthropic response format.

POST /v1/messages/beta

Identical to the standard messages endpoint but also accepts the anthropic-beta header for features such as extended thinking, prompt caching, and computer use. Set the header in your request and MonoRelay forwards it upstream unchanged.

anthropic-beta: interleaved-thinking-2025-05-14

Request body

model

string

required

The model identifier to use, e.g. claude-opus-4-5 or a MonoRelay alias. MonoRelay resolves aliases and provider routing before forwarding.

max_tokens

integer

required

Maximum number of tokens to generate. Required by the Anthropic API; has no default.

messages

array

required

Ordered list of conversation turns. Each element must have:

role — "user" or "assistant"
content — a string or an array of content blocks (text, image, tool_result, etc.)

The array must alternate roles and must begin with a user turn.

system

string or array

System prompt text. Accepts either a plain string or an array of content blocks for structured system prompts (useful with prompt caching).

temperature

number

Sampling temperature between 0.0 and 1.0. Controls output randomness. Omit to use the model’s default.

top_p

number

Nucleus sampling threshold. Only tokens in the top top_p probability mass are considered. Usually set either temperature or top_p, not both.

top_k

integer

Only sample from the top top_k tokens at each step. Not available on all models.

stop_sequences

array

List of strings that stop generation when encountered. The model will not include the stop sequence in the response.

stream

boolean

Set to true to receive a Server-Sent Events stream instead of a single JSON response. Defaults to false.

tools

array

List of tool definitions the model may call. Each tool requires name, description, and a JSON Schema input_schema.

tool_choice

object

Controls tool selection behavior. Use {"type": "auto"}, {"type": "any"}, or {"type": "tool", "name": "my_tool"} to force a specific tool.

Response fields

string

Unique message identifier, e.g. msg_01XFDUDYJgAACTJgxXiHvVEF.

type

string

Always "message" for non-streaming responses.

role

string

Always "assistant".

content

array

Array of content blocks. A standard text response contains one block of {"type": "text", "text": "..."}. Tool use responses include {"type": "tool_use", ...} blocks. Extended thinking responses include {"type": "thinking", "thinking": "..."} blocks.

model

string

The model that generated the response as resolved by MonoRelay.

stop_reason

string

Why generation stopped: "end_turn", "max_tokens", "stop_sequence", or "tool_use".

usage

object

Token consumption report:

input_tokens — prompt tokens consumed
output_tokens — completion tokens generated
cache_read_input_tokens — tokens served from prompt cache
cache_creation_input_tokens — tokens written to prompt cache

Python SDK example

Point the official Anthropic Python SDK at MonoRelay by setting base_url. No other code changes are needed.

import anthropic

client = anthropic.Anthropic(
    api_key="YOUR_MONORELAY_TOKEN",
    base_url="http://localhost:8787",
)

message = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain prompt caching in one paragraph."}
    ],
)

print(message.content[0].text)

curl example

curl http://localhost:8787/v1/messages \
  -H "x-api-key: YOUR_MONORELAY_TOKEN" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-opus-4-5",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Streaming example

curl http://localhost:8787/v1/messages \
  -H "x-api-key: YOUR_MONORELAY_TOKEN" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  --no-buffer \
  -d '{
    "model": "claude-opus-4-5",
    "max_tokens": 512,
    "stream": true,
    "messages": [
      {"role": "user", "content": "Count slowly from 1 to 5."}
    ]
  }'

Streaming responses arrive as Server-Sent Events. The sequence of event types is: message_start → content_block_start → content_block_delta (repeated) → content_block_stop → message_delta → message_stop.

Overview

OpenAI-Compatible

Anthropic-Compatible

Management API

POST /v1/messages — Anthropic-compatible messages endpoint

Endpoints

Authentication

POST /v1/messages

GET /v1/anthropic/models

POST /v1/messages/beta

Request body

Response fields

Python SDK example

curl example

Streaming example

Build docs developers (and LLMs) love

Overview

OpenAI-Compatible

Anthropic-Compatible

Management API

Documentation Index

​Endpoints

​Authentication

​POST /v1/messages

​GET /v1/anthropic/models

​POST /v1/messages/beta

​Request body

​Response fields

​Python SDK example

​curl example

​Streaming example

Build docs developers (and LLMs) love

Endpoints

Authentication

POST /v1/messages

GET /v1/anthropic/models

POST /v1/messages/beta

Request body

Response fields

Python SDK example

curl example

Streaming example