API reference: all MonoRelay endpoints and base URLs

MonoRelay exposes two API families — an OpenAI-compatible family rooted at /v1/* and an Anthropic-compatible family at /v1/messages and /v1/anthropic/*. Both families proxy requests to upstream providers (OpenRouter, NVIDIA NIM, OpenAI, Anthropic, and others) while handling key rotation, retries, and rate-limit cooldowns transparently.

Base URL

The base URL is determined at startup in this order:

public_host config value — if server.public_host is set in config.yml, MonoRelay constructs the base URL from that value. A bare hostname (e.g. relay.example.com) is treated as HTTPS; a value that begins with http:// or https:// is used as-is.
Auto-detection — if public_host is empty, MonoRelay queries api.ipify.org to discover the server’s public IP. If that fails, it falls back to the local network interface address.

The resolved base URL is returned by GET /api/info in the base_url field.

config.yml

server:
  public_host: "relay.example.com"   # leave empty for auto-detection
  port: 8787

API families

Family	Base path	Compatible clients
OpenAI-compatible	`/v1/`	OpenAI Python/JS SDKs, any OpenAI-compatible client
Anthropic-compatible	`/v1/messages`, `/v1/anthropic/`	Anthropic Python/JS SDKs

Endpoint index

Chat and completions

Method	Path	Description
`POST`	`/v1/chat/completions`	Chat completions (streaming and non-streaming)
`POST`	`/v1/completions`	Legacy text completions
`POST`	`/v1/responses`	Responses API (OpenAI Responses format)

Embeddings and moderation

Method	Path	Description
`POST`	`/v1/embeddings`	Generate text embeddings
`POST`	`/v1/moderations`	Content moderation check

Models

Method	Path	Description
`GET`	`/v1/models`	List available models (public, no auth required)

Credits

Method	Path	Description
`GET`	`/v1/credits`	Retrieve upstream credit balance

Audio

Method	Path	Description
`POST`	`/v1/audio/transcriptions`	Transcribe audio to text (Whisper-compatible)
`POST`	`/v1/audio/translations`	Translate audio to English text

Images

Method	Path	Description
`POST`	`/v1/images/generations`	Generate images from a text prompt
`POST`	`/v1/images/variations`	Create variations of an uploaded image
`POST`	`/v1/images/edits`	Edit an image with a mask and prompt

Files

Method	Path	Description
`GET`	`/v1/files`	List uploaded files
`GET`	`/v1/files/{file_id}`	Retrieve metadata for a specific file
`GET`	`/v1/files/{file_id}/content`	Download the content of a file

Fine-tuning

Method	Path	Description
`GET`	`/v1/fine_tuning/jobs`	List fine-tuning jobs
`POST`	`/v1/fine_tuning/jobs`	Create a fine-tuning job
`GET`	`/v1/fine_tuning/jobs/{job_id}`	Retrieve a fine-tuning job
`POST`	`/v1/fine_tuning/jobs/{job_id}/cancel`	Cancel a fine-tuning job

Batches

Method	Path	Description
`GET`	`/v1/batches`	List batch jobs
`POST`	`/v1/batches`	Create a batch job
`GET`	`/v1/batches/{batch_id}`	Retrieve a batch job

Assistants

Method	Path	Description
`GET`	`/v1/assistants`	List assistants
`POST`	`/v1/assistants`	Create an assistant
`GET`	`/v1/assistants/{assistant_id}`	Retrieve an assistant
`POST`	`/v1/assistants/{assistant_id}`	Update an assistant
`DELETE`	`/v1/assistants/{assistant_id}`	Delete an assistant

Threads

Method	Path	Description
`GET`	`/v1/threads`	List threads
`POST`	`/v1/threads`	Create a thread
`GET`	`/v1/threads/{thread_id}`	Retrieve a thread
`POST`	`/v1/threads/{thread_id}`	Modify a thread
`DELETE`	`/v1/threads/{thread_id}`	Delete a thread
`GET`	`/v1/threads/{thread_id}/messages`	List messages in a thread
`POST`	`/v1/threads/{thread_id}/messages`	Add a message to a thread
`GET`	`/v1/threads/{thread_id}/runs`	List runs for a thread
`POST`	`/v1/threads/{thread_id}/runs`	Create a run
`GET`	`/v1/threads/{thread_id}/runs/{run_id}`	Retrieve a run
`POST`	`/v1/threads/{thread_id}/runs/{run_id}/cancel`	Cancel a run

Anthropic-compatible

Method	Path	Description
`POST`	`/v1/messages`	Anthropic Messages API (native format)
`POST`	`/v1/messages/beta`	Anthropic Messages API beta features
`GET`	`/v1/anthropic/models`	List models via Anthropic provider

Utility

Method	Path	Description
`GET`	`/health`	Server health check and provider status
`GET`	`/api/info`	Server connection info, base URL, and system metrics

Response format

Most OpenAI-compatible endpoints return responses in the standard OpenAI envelope. For chat completions, a successful non-streaming response looks like:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1716000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 10,
    "total_tokens": 22
  }
}

Streaming responses use the Server-Sent Events (SSE) format and terminate with data: [DONE].

Error format

All errors — whether authentication failures, upstream errors, or proxy errors — follow the same envelope:

{
  "error": {
    "message": "No available keys for provider 'openai'",
    "type": "no_keys"
  }
}

Common type values:

Type	Meaning
`auth_error`	Missing or invalid authentication token
`no_keys`	No API keys available for the selected provider
`provider_disabled`	The resolved provider is not enabled in config
`upstream_error`	The upstream provider returned an error
`proxy_error`	An unexpected error occurred in the proxy layer
`cascade_error`	All models in a cascade chain failed

Rate limiting

MonoRelay does not impose its own rate limits. When an upstream provider returns HTTP 429, the key that triggered the response is placed into a cooldown period. The cooldown duration (in seconds) is configurable per provider via rate_limit_cooldown in config.yml. During cooldown, MonoRelay automatically selects a different key for subsequent requests. If no keys are available, a no_keys error is returned.

config.yml

providers:
  openrouter:
    rate_limit_cooldown: 60   # seconds to pause a key after a 429

Explore endpoints

Authentication

How to pass your access key or JWT token to authenticate requests.

Chat completions

Send multi-turn conversations to any upstream model.

Models

List all models available through configured providers.

Embeddings

Generate vector embeddings for text input.

Overview

OpenAI-Compatible

Anthropic-Compatible

Management API

API reference: all MonoRelay endpoints and base URLs

Base URL

API families

Endpoint index

Chat and completions

Embeddings and moderation

Models

Credits

Audio

Images

Files

Fine-tuning

Batches

Assistants

Threads

Anthropic-compatible

Utility

Response format

Error format

Rate limiting

Explore endpoints

Authentication

Chat completions

Models

Embeddings

Build docs developers (and LLMs) love

Overview

OpenAI-Compatible

Anthropic-Compatible

Management API

Documentation Index

​Base URL

​API families

​Endpoint index

​Chat and completions

​Embeddings and moderation

​Models

​Credits

​Audio

​Images

​Files

​Fine-tuning

​Batches

​Assistants

​Threads

​Anthropic-compatible

​Utility

​Response format

​Error format

​Rate limiting

​Explore endpoints

Authentication

Chat completions

Models

Embeddings

Build docs developers (and LLMs) love

Base URL

API families

Endpoint index

Chat and completions

Embeddings and moderation

Models

Credits

Audio

Images

Files

Fine-tuning

Batches

Assistants

Threads

Anthropic-compatible

Utility

Response format

Error format

Rate limiting

Explore endpoints