MonoRelay: self-hosted LLM API relay for multiple providers

MonoRelay is a self-hosted LLM API relay server built on FastAPI. It sits between your application and any number of AI providers, exposing a single OpenAI-compatible or Anthropic-compatible endpoint. Teams and developers who work across multiple providers — or who need key rotation, request logging, and access control — deploy MonoRelay to manage all of that in one place instead of wiring it into every application.

Quickstart

Install MonoRelay, configure a provider, and send your first request in minutes.

Deployment

Deploy with Docker, systemd, PM2, a background script, or direct Python.

Configuration

Configure providers, model routing, key selection strategies, and logging.

API reference

Explore the full OpenAI-compatible and Anthropic-compatible endpoint reference.

Key features

Multi-provider support

Connect OpenRouter, NVIDIA NIM, OpenAI, Anthropic, DeepSeek, Groq, and the ChatGPT web reverse proxy through a single endpoint.

Smart model routing

Define model aliases, glob-based provider mappings, complexity-based selection, and cascade fallback chains.

Key rotation

Round-robin, random, or weighted key selection with automatic cooldown on rate-limit errors.

Full streaming

Native SSE streaming for both OpenAI and Anthropic formats with no buffering overhead.

Admin dashboard

Vue 3 SPA with request stats, provider health, live logs, user management, and an in-browser config editor.

Request logging

SQLite-backed logging of every request and response with full content viewing and error inspection.

OAuth SSO

GitHub and Google OAuth login alongside local accounts, with per-user admin role assignment.

Config sync

Back up and restore your config.yml via GitHub Gist for multi-device or multi-instance setups.

Tool-call downgrade

Automatically strips tool-call parameters for models that do not support function calling.

Requirements

MonoRelay requires Python 3.11 or later. Python 3.12 is recommended and is used in the official Docker image.

Python 3.10 and earlier may fail to install MonoRelay’s dependencies. Always use Python 3.11+.

The full Python dependency set is pinned in requirements.txt:

requirements.txt

fastapi==0.115.6
uvicorn[standard]==0.34.0
httpx==0.28.1
pydantic==2.10.4
pydantic-settings==2.7.1
pyyaml==6.0.2
aiosqlite==0.20.0
watchfiles==1.0.4
python-multipart==0.0.20
pybase64==1.4.1
python-jose[cryptography]==3.3.0
authlib==1.3.1
psutil==7.2.2

Architecture overview

Every request follows the same path through MonoRelay:

Authentication — MonoRelay validates the Authorization: Bearer <access_key> header against the key configured in config.yml.
Model routing — The model router resolves aliases (e.g., smart → anthropic/claude-sonnet-4-20250514) and applies glob-based provider mapping rules.
Key selection — The key manager picks an API key for the target provider using your configured strategy (round-robin by default) and applies cooldown to any key that received a 429.
Proxying — The request is forwarded to the upstream provider. Streaming responses are passed through as SSE without buffering.
Logging — The complete request and response are persisted to SQLite and aggregated into in-memory statistics that power the admin dashboard.

Get Started

Configuration

Admin Dashboard

Authentication

MonoRelay: self-hosted LLM API relay for multiple providers

Quickstart

Deployment

Configuration

API reference

Key features

Multi-provider support

Smart model routing

Key rotation

Full streaming

Admin dashboard

Request logging

OAuth SSO

Config sync

Tool-call downgrade

Requirements

Architecture overview

Build docs developers (and LLMs) love

Get Started

Configuration

Admin Dashboard

Authentication

Documentation Index

Quickstart

Deployment

Configuration

API reference

​Key features

Multi-provider support

Smart model routing

Key rotation

Full streaming

Admin dashboard

Request logging

OAuth SSO

Config sync

Tool-call downgrade

​Requirements

​Architecture overview

Build docs developers (and LLMs) love

Key features

Requirements

Architecture overview