Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Excurs1ons/MonoRelay/llms.txt

Use this file to discover all available pages before exploring further.

MonoRelay exposes two API families — an OpenAI-compatible family rooted at /v1/* and an Anthropic-compatible family at /v1/messages and /v1/anthropic/*. Both families proxy requests to upstream providers (OpenRouter, NVIDIA NIM, OpenAI, Anthropic, and others) while handling key rotation, retries, and rate-limit cooldowns transparently.

Base URL

The base URL is determined at startup in this order:
  1. public_host config value — if server.public_host is set in config.yml, MonoRelay constructs the base URL from that value. A bare hostname (e.g. relay.example.com) is treated as HTTPS; a value that begins with http:// or https:// is used as-is.
  2. Auto-detection — if public_host is empty, MonoRelay queries api.ipify.org to discover the server’s public IP. If that fails, it falls back to the local network interface address.
The resolved base URL is returned by GET /api/info in the base_url field.
config.yml
server:
  public_host: "relay.example.com"   # leave empty for auto-detection
  port: 8787

API families

FamilyBase pathCompatible clients
OpenAI-compatible/v1/OpenAI Python/JS SDKs, any OpenAI-compatible client
Anthropic-compatible/v1/messages, /v1/anthropic/Anthropic Python/JS SDKs

Endpoint index

Chat and completions

MethodPathDescription
POST/v1/chat/completionsChat completions (streaming and non-streaming)
POST/v1/completionsLegacy text completions
POST/v1/responsesResponses API (OpenAI Responses format)

Embeddings and moderation

MethodPathDescription
POST/v1/embeddingsGenerate text embeddings
POST/v1/moderationsContent moderation check

Models

MethodPathDescription
GET/v1/modelsList available models (public, no auth required)

Credits

MethodPathDescription
GET/v1/creditsRetrieve upstream credit balance

Audio

MethodPathDescription
POST/v1/audio/transcriptionsTranscribe audio to text (Whisper-compatible)
POST/v1/audio/translationsTranslate audio to English text

Images

MethodPathDescription
POST/v1/images/generationsGenerate images from a text prompt
POST/v1/images/variationsCreate variations of an uploaded image
POST/v1/images/editsEdit an image with a mask and prompt

Files

MethodPathDescription
GET/v1/filesList uploaded files
GET/v1/files/{file_id}Retrieve metadata for a specific file
GET/v1/files/{file_id}/contentDownload the content of a file

Fine-tuning

MethodPathDescription
GET/v1/fine_tuning/jobsList fine-tuning jobs
POST/v1/fine_tuning/jobsCreate a fine-tuning job
GET/v1/fine_tuning/jobs/{job_id}Retrieve a fine-tuning job
POST/v1/fine_tuning/jobs/{job_id}/cancelCancel a fine-tuning job

Batches

MethodPathDescription
GET/v1/batchesList batch jobs
POST/v1/batchesCreate a batch job
GET/v1/batches/{batch_id}Retrieve a batch job

Assistants

MethodPathDescription
GET/v1/assistantsList assistants
POST/v1/assistantsCreate an assistant
GET/v1/assistants/{assistant_id}Retrieve an assistant
POST/v1/assistants/{assistant_id}Update an assistant
DELETE/v1/assistants/{assistant_id}Delete an assistant

Threads

MethodPathDescription
GET/v1/threadsList threads
POST/v1/threadsCreate a thread
GET/v1/threads/{thread_id}Retrieve a thread
POST/v1/threads/{thread_id}Modify a thread
DELETE/v1/threads/{thread_id}Delete a thread
GET/v1/threads/{thread_id}/messagesList messages in a thread
POST/v1/threads/{thread_id}/messagesAdd a message to a thread
GET/v1/threads/{thread_id}/runsList runs for a thread
POST/v1/threads/{thread_id}/runsCreate a run
GET/v1/threads/{thread_id}/runs/{run_id}Retrieve a run
POST/v1/threads/{thread_id}/runs/{run_id}/cancelCancel a run

Anthropic-compatible

MethodPathDescription
POST/v1/messagesAnthropic Messages API (native format)
POST/v1/messages/betaAnthropic Messages API beta features
GET/v1/anthropic/modelsList models via Anthropic provider

Utility

MethodPathDescription
GET/healthServer health check and provider status
GET/api/infoServer connection info, base URL, and system metrics

Response format

Most OpenAI-compatible endpoints return responses in the standard OpenAI envelope. For chat completions, a successful non-streaming response looks like:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1716000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 10,
    "total_tokens": 22
  }
}
Streaming responses use the Server-Sent Events (SSE) format and terminate with data: [DONE].

Error format

All errors — whether authentication failures, upstream errors, or proxy errors — follow the same envelope:
{
  "error": {
    "message": "No available keys for provider 'openai'",
    "type": "no_keys"
  }
}
Common type values:
TypeMeaning
auth_errorMissing or invalid authentication token
no_keysNo API keys available for the selected provider
provider_disabledThe resolved provider is not enabled in config
upstream_errorThe upstream provider returned an error
proxy_errorAn unexpected error occurred in the proxy layer
cascade_errorAll models in a cascade chain failed

Rate limiting

MonoRelay does not impose its own rate limits. When an upstream provider returns HTTP 429, the key that triggered the response is placed into a cooldown period. The cooldown duration (in seconds) is configurable per provider via rate_limit_cooldown in config.yml. During cooldown, MonoRelay automatically selects a different key for subsequent requests. If no keys are available, a no_keys error is returned.
config.yml
providers:
  openrouter:
    rate_limit_cooldown: 60   # seconds to pause a key after a 429

Explore endpoints

Authentication

How to pass your access key or JWT token to authenticate requests.

Chat completions

Send multi-turn conversations to any upstream model.

Models

List all models available through configured providers.

Embeddings

Generate vector embeddings for text input.

Build docs developers (and LLMs) love