MonoRelay is a self-hosted LLM API relay server built on FastAPI. It sits between your application and any number of AI providers, exposing a single OpenAI-compatible or Anthropic-compatible endpoint. Teams and developers who work across multiple providers — or who need key rotation, request logging, and access control — deploy MonoRelay to manage all of that in one place instead of wiring it into every application.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Excurs1ons/MonoRelay/llms.txt
Use this file to discover all available pages before exploring further.
Quickstart
Install MonoRelay, configure a provider, and send your first request in minutes.
Deployment
Deploy with Docker, systemd, PM2, a background script, or direct Python.
Configuration
Configure providers, model routing, key selection strategies, and logging.
API reference
Explore the full OpenAI-compatible and Anthropic-compatible endpoint reference.
Key features
Multi-provider support
Connect OpenRouter, NVIDIA NIM, OpenAI, Anthropic, DeepSeek, Groq, and the ChatGPT web reverse proxy through a single endpoint.
Smart model routing
Define model aliases, glob-based provider mappings, complexity-based selection, and cascade fallback chains.
Key rotation
Round-robin, random, or weighted key selection with automatic cooldown on rate-limit errors.
Full streaming
Native SSE streaming for both OpenAI and Anthropic formats with no buffering overhead.
Admin dashboard
Vue 3 SPA with request stats, provider health, live logs, user management, and an in-browser config editor.
Request logging
SQLite-backed logging of every request and response with full content viewing and error inspection.
OAuth SSO
GitHub and Google OAuth login alongside local accounts, with per-user admin role assignment.
Config sync
Back up and restore your
config.yml via GitHub Gist for multi-device or multi-instance setups.Tool-call downgrade
Automatically strips tool-call parameters for models that do not support function calling.
Requirements
MonoRelay requires Python 3.11 or later. Python 3.12 is recommended and is used in the official Docker image. The full Python dependency set is pinned inrequirements.txt:
requirements.txt
Architecture overview
Every request follows the same path through MonoRelay:- Authentication — MonoRelay validates the
Authorization: Bearer <access_key>header against the key configured inconfig.yml. - Model routing — The model router resolves aliases (e.g.,
smart→anthropic/claude-sonnet-4-20250514) and applies glob-based provider mapping rules. - Key selection — The key manager picks an API key for the target provider using your configured strategy (round-robin by default) and applies cooldown to any key that received a 429.
- Proxying — The request is forwarded to the upstream provider. Streaming responses are passed through as SSE without buffering.
- Logging — The complete request and response are persisted to SQLite and aggregated into in-memory statistics that power the admin dashboard.