Rotate API keys and configure rate limiting per provider

MonoRelay supports multiple API keys per provider and automatically rotates among them. When a key receives a 429 rate-limit response, it is put on a cooldown and the next available key is selected for subsequent requests. This section covers the selection strategies, per-key limits, time-windowed usage caps, and the tool-calling auto-downgrade feature.

Key selection strategies

The global strategy is set in the key_selection block and applies to all providers.

config.yml

key_selection:
  strategy: "round-robin"  # round-robin | random | weighted

Strategy	Behavior
`round-robin`	Keys are used in order, cycling back to the first after the last. Default.
`random`	A random available key is chosen for each request.
`weighted`	Keys are chosen randomly, with probability proportional to each key’s `weight` value.

Only keys where enabled: true, quota is not exceeded, RPS limit is not reached, and cooldown has expired are considered “available”. If no key is fully available, MonoRelay falls back gracefully rather than failing the request.

Per-key fields

Each entry in a provider’s keys list accepts the following fields:

key

string

required

The raw API key string sent in the Authorization header.

label

string

default:"\"default\""

Human-readable name for this key, used in logs and the management dashboard.

weight

number

default:"1"

Relative selection weight used by the weighted strategy. Higher values increase selection probability.

enabled

boolean

default:"true"

Set to false to temporarily disable a key without removing it from the config.

quota_limit

number

default:"0"

Maximum number of requests this key may serve in its lifetime. 0 means unlimited.

rate_limit_rps

number

default:"0.0"

Maximum requests per second for this key. 0 means unlimited. Enforced using a sliding 1-second window.

expires_at

string

default:"\"\""

ISO 8601 datetime after which the key is treated as disabled (e.g., "2026-12-31T23:59:59Z"). Empty string means no expiry.

Multiple keys example

config.yml

providers:
  openrouter:
    enabled: true
    base_url: "https://openrouter.ai/api/v1"
    rate_limit_cooldown: 60
    keys:
      - key: "sk-or-v1-primary-key"
        label: "primary"
        weight: 3
        quota_limit: 0
        rate_limit_rps: 10

      - key: "sk-or-v1-backup-key"
        label: "backup"
        weight: 1
        quota_limit: 5000
        expires_at: "2026-12-31T23:59:59Z"
        usage_window_limits:
          window_5h: 1000
          window_1d: 5000
          window_7d: 20000

Usage window limits

Each key can be capped over three rolling time windows. Counters are persisted to ./data/key_usage.json so they survive restarts.

Field	Window	Description
`window_5h`	5 hours	Max requests in any 5-hour window. `0` = unlimited.
`window_1d`	24 hours	Max requests in any 24-hour window. `0` = unlimited.
`window_7d`	7 days	Max requests in any 7-day window. `0` = unlimited.

config.yml

keys:
  - key: "sk-or-v1-budget-key"
    label: "budget"
    usage_window_limits:
      window_5h: 1000
      window_1d: 5000
      window_7d: 20000

Usage windows use sliding timestamps, not calendar periods. A window of 1 000 requests per 5 hours means the key will not serve more than 1 000 requests in any contiguous 5-hour span.

Cooldown behavior

When a provider returns a 429 (Too Many Requests) response for a key, the key manager calls mark_failure on that key entry. The key is blocked from selection for rate_limit_cooldown seconds (configured per provider). During this window, MonoRelay selects the next available key.

config.yml

providers:
  openrouter:
    rate_limit_cooldown: 60   # seconds a key is blocked after a 429

After the cooldown expires, the key becomes eligible for selection again. A key’s failure count resets on the next successful response.

If all keys for a provider are simultaneously on cooldown, MonoRelay will still attempt the request using the first key in the list rather than returning an error. Configure enough keys with staggered weight values to minimize this scenario.

Tool calling auto-downgrade

Some models do not support the tools / tool_choice fields in the OpenAI request format. The tool_calling block lets you specify glob patterns for such models; MonoRelay will strip tool-related fields from the request before forwarding.

config.yml

tool_calling:
  auto_downgrade: true
  unsupported_models:
    - "meta/llama-4-maverick-*"
    - "meta/llama-3.1-*"

tool_calling.auto_downgrade

boolean

default:"true"

When true, tool parameters are automatically removed for any model matching an entry in unsupported_models.

tool_calling.unsupported_models

string[]

default:"[]"

List of fnmatch glob patterns. Any resolved model matching a pattern will have tools and tool_choice stripped from the outbound request.

Auto-downgrade applies to the final resolved model name (after alias and routing). If a model is routed via an alias, the pattern must match the target name, not the alias.

Get Started

Configuration

Admin Dashboard

Authentication

Rotate API keys and configure rate limiting per provider

Key selection strategies

Per-key fields

Multiple keys example

Usage window limits

Cooldown behavior

Tool calling auto-downgrade

Build docs developers (and LLMs) love

Get Started

Configuration

Admin Dashboard

Authentication

Documentation Index

​Key selection strategies

​Per-key fields

​Multiple keys example

​Usage window limits

​Cooldown behavior

​Tool calling auto-downgrade

Build docs developers (and LLMs) love

Key selection strategies

Per-key fields

Multiple keys example

Usage window limits

Cooldown behavior

Tool calling auto-downgrade