Use this file to discover all available pages before exploring further.
MonoRelay exposes two API families — an OpenAI-compatible family rooted at /v1/* and an Anthropic-compatible family at /v1/messages and /v1/anthropic/*. Both families proxy requests to upstream providers (OpenRouter, NVIDIA NIM, OpenAI, Anthropic, and others) while handling key rotation, retries, and rate-limit cooldowns transparently.
The base URL is determined at startup in this order:
public_host config value — if server.public_host is set in config.yml, MonoRelay constructs the base URL from that value. A bare hostname (e.g. relay.example.com) is treated as HTTPS; a value that begins with http:// or https:// is used as-is.
Auto-detection — if public_host is empty, MonoRelay queries api.ipify.org to discover the server’s public IP. If that fails, it falls back to the local network interface address.
The resolved base URL is returned by GET /api/info in the base_url field.
config.yml
server: public_host: "relay.example.com" # leave empty for auto-detection port: 8787
Most OpenAI-compatible endpoints return responses in the standard OpenAI envelope. For chat completions, a successful non-streaming response looks like:
{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1716000000, "model": "gpt-4o", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you today?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 10, "total_tokens": 22 }}
Streaming responses use the Server-Sent Events (SSE) format and terminate with data: [DONE].
MonoRelay does not impose its own rate limits. When an upstream provider returns HTTP 429, the key that triggered the response is placed into a cooldown period. The cooldown duration (in seconds) is configurable per provider via rate_limit_cooldown in config.yml. During cooldown, MonoRelay automatically selects a different key for subsequent requests. If no keys are available, a no_keys error is returned.
config.yml
providers: openrouter: rate_limit_cooldown: 60 # seconds to pause a key after a 429