Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Excurs1ons/MonoRelay/llms.txt

Use this file to discover all available pages before exploring further.

MonoRelay is a self-hosted LLM API relay server built on FastAPI. It sits between your application and any number of AI providers, exposing a single OpenAI-compatible or Anthropic-compatible endpoint. Teams and developers who work across multiple providers — or who need key rotation, request logging, and access control — deploy MonoRelay to manage all of that in one place instead of wiring it into every application.

Quickstart

Install MonoRelay, configure a provider, and send your first request in minutes.

Deployment

Deploy with Docker, systemd, PM2, a background script, or direct Python.

Configuration

Configure providers, model routing, key selection strategies, and logging.

API reference

Explore the full OpenAI-compatible and Anthropic-compatible endpoint reference.

Key features

Multi-provider support

Connect OpenRouter, NVIDIA NIM, OpenAI, Anthropic, DeepSeek, Groq, and the ChatGPT web reverse proxy through a single endpoint.

Smart model routing

Define model aliases, glob-based provider mappings, complexity-based selection, and cascade fallback chains.

Key rotation

Round-robin, random, or weighted key selection with automatic cooldown on rate-limit errors.

Full streaming

Native SSE streaming for both OpenAI and Anthropic formats with no buffering overhead.

Admin dashboard

Vue 3 SPA with request stats, provider health, live logs, user management, and an in-browser config editor.

Request logging

SQLite-backed logging of every request and response with full content viewing and error inspection.

OAuth SSO

GitHub and Google OAuth login alongside local accounts, with per-user admin role assignment.

Config sync

Back up and restore your config.yml via GitHub Gist for multi-device or multi-instance setups.

Tool-call downgrade

Automatically strips tool-call parameters for models that do not support function calling.

Requirements

MonoRelay requires Python 3.11 or later. Python 3.12 is recommended and is used in the official Docker image.
Python 3.10 and earlier may fail to install MonoRelay’s dependencies. Always use Python 3.11+.
The full Python dependency set is pinned in requirements.txt:
requirements.txt
fastapi==0.115.6
uvicorn[standard]==0.34.0
httpx==0.28.1
pydantic==2.10.4
pydantic-settings==2.7.1
pyyaml==6.0.2
aiosqlite==0.20.0
watchfiles==1.0.4
python-multipart==0.0.20
pybase64==1.4.1
python-jose[cryptography]==3.3.0
authlib==1.3.1
psutil==7.2.2

Architecture overview

Every request follows the same path through MonoRelay:
  1. Authentication — MonoRelay validates the Authorization: Bearer <access_key> header against the key configured in config.yml.
  2. Model routing — The model router resolves aliases (e.g., smartanthropic/claude-sonnet-4-20250514) and applies glob-based provider mapping rules.
  3. Key selection — The key manager picks an API key for the target provider using your configured strategy (round-robin by default) and applies cooldown to any key that received a 429.
  4. Proxying — The request is forwarded to the upstream provider. Streaming responses are passed through as SSE without buffering.
  5. Logging — The complete request and response are persisted to SQLite and aggregated into in-memory statistics that power the admin dashboard.

Build docs developers (and LLMs) love