Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Excurs1ons/MonoRelay/llms.txt

Use this file to discover all available pages before exploring further.

MonoRelay supports multiple API keys per provider and automatically rotates among them. When a key receives a 429 rate-limit response, it is put on a cooldown and the next available key is selected for subsequent requests. This section covers the selection strategies, per-key limits, time-windowed usage caps, and the tool-calling auto-downgrade feature.

Key selection strategies

The global strategy is set in the key_selection block and applies to all providers.
config.yml
key_selection:
  strategy: "round-robin"  # round-robin | random | weighted
StrategyBehavior
round-robinKeys are used in order, cycling back to the first after the last. Default.
randomA random available key is chosen for each request.
weightedKeys are chosen randomly, with probability proportional to each key’s weight value.
Only keys where enabled: true, quota is not exceeded, RPS limit is not reached, and cooldown has expired are considered “available”. If no key is fully available, MonoRelay falls back gracefully rather than failing the request.

Per-key fields

Each entry in a provider’s keys list accepts the following fields:
key
string
required
The raw API key string sent in the Authorization header.
label
string
default:"\"default\""
Human-readable name for this key, used in logs and the management dashboard.
weight
number
default:"1"
Relative selection weight used by the weighted strategy. Higher values increase selection probability.
enabled
boolean
default:"true"
Set to false to temporarily disable a key without removing it from the config.
quota_limit
number
default:"0"
Maximum number of requests this key may serve in its lifetime. 0 means unlimited.
rate_limit_rps
number
default:"0.0"
Maximum requests per second for this key. 0 means unlimited. Enforced using a sliding 1-second window.
expires_at
string
default:"\"\""
ISO 8601 datetime after which the key is treated as disabled (e.g., "2026-12-31T23:59:59Z"). Empty string means no expiry.

Multiple keys example

config.yml
providers:
  openrouter:
    enabled: true
    base_url: "https://openrouter.ai/api/v1"
    rate_limit_cooldown: 60
    keys:
      - key: "sk-or-v1-primary-key"
        label: "primary"
        weight: 3
        quota_limit: 0
        rate_limit_rps: 10

      - key: "sk-or-v1-backup-key"
        label: "backup"
        weight: 1
        quota_limit: 5000
        expires_at: "2026-12-31T23:59:59Z"
        usage_window_limits:
          window_5h: 1000
          window_1d: 5000
          window_7d: 20000

Usage window limits

Each key can be capped over three rolling time windows. Counters are persisted to ./data/key_usage.json so they survive restarts.
FieldWindowDescription
window_5h5 hoursMax requests in any 5-hour window. 0 = unlimited.
window_1d24 hoursMax requests in any 24-hour window. 0 = unlimited.
window_7d7 daysMax requests in any 7-day window. 0 = unlimited.
config.yml
keys:
  - key: "sk-or-v1-budget-key"
    label: "budget"
    usage_window_limits:
      window_5h: 1000
      window_1d: 5000
      window_7d: 20000
Usage windows use sliding timestamps, not calendar periods. A window of 1 000 requests per 5 hours means the key will not serve more than 1 000 requests in any contiguous 5-hour span.

Cooldown behavior

When a provider returns a 429 (Too Many Requests) response for a key, the key manager calls mark_failure on that key entry. The key is blocked from selection for rate_limit_cooldown seconds (configured per provider). During this window, MonoRelay selects the next available key.
config.yml
providers:
  openrouter:
    rate_limit_cooldown: 60   # seconds a key is blocked after a 429
After the cooldown expires, the key becomes eligible for selection again. A key’s failure count resets on the next successful response.
If all keys for a provider are simultaneously on cooldown, MonoRelay will still attempt the request using the first key in the list rather than returning an error. Configure enough keys with staggered weight values to minimize this scenario.

Tool calling auto-downgrade

Some models do not support the tools / tool_choice fields in the OpenAI request format. The tool_calling block lets you specify glob patterns for such models; MonoRelay will strip tool-related fields from the request before forwarding.
config.yml
tool_calling:
  auto_downgrade: true
  unsupported_models:
    - "meta/llama-4-maverick-*"
    - "meta/llama-3.1-*"
tool_calling.auto_downgrade
boolean
default:"true"
When true, tool parameters are automatically removed for any model matching an entry in unsupported_models.
tool_calling.unsupported_models
string[]
default:"[]"
List of fnmatch glob patterns. Any resolved model matching a pattern will have tools and tool_choice stripped from the outbound request.
Auto-downgrade applies to the final resolved model name (after alias and routing). If a model is routed via an alias, the pattern must match the target name, not the alias.

Build docs developers (and LLMs) love