MonoRelay supports multiple API keys per provider and automatically rotates among them. When a key receives aDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/Excurs1ons/MonoRelay/llms.txt
Use this file to discover all available pages before exploring further.
429 rate-limit response, it is put on a cooldown and the next available key is selected for subsequent requests. This section covers the selection strategies, per-key limits, time-windowed usage caps, and the tool-calling auto-downgrade feature.
Key selection strategies
The global strategy is set in thekey_selection block and applies to all providers.
config.yml
| Strategy | Behavior |
|---|---|
round-robin | Keys are used in order, cycling back to the first after the last. Default. |
random | A random available key is chosen for each request. |
weighted | Keys are chosen randomly, with probability proportional to each key’s weight value. |
Only keys where
enabled: true, quota is not exceeded, RPS limit is not reached, and cooldown has expired are considered “available”. If no key is fully available, MonoRelay falls back gracefully rather than failing the request.Per-key fields
Each entry in a provider’skeys list accepts the following fields:
The raw API key string sent in the
Authorization header.Human-readable name for this key, used in logs and the management dashboard.
Relative selection weight used by the
weighted strategy. Higher values increase selection probability.Set to
false to temporarily disable a key without removing it from the config.Maximum number of requests this key may serve in its lifetime.
0 means unlimited.Maximum requests per second for this key.
0 means unlimited. Enforced using a sliding 1-second window.ISO 8601 datetime after which the key is treated as disabled (e.g.,
"2026-12-31T23:59:59Z"). Empty string means no expiry.Multiple keys example
config.yml
Usage window limits
Each key can be capped over three rolling time windows. Counters are persisted to./data/key_usage.json so they survive restarts.
| Field | Window | Description |
|---|---|---|
window_5h | 5 hours | Max requests in any 5-hour window. 0 = unlimited. |
window_1d | 24 hours | Max requests in any 24-hour window. 0 = unlimited. |
window_7d | 7 days | Max requests in any 7-day window. 0 = unlimited. |
config.yml
Cooldown behavior
When a provider returns a429 (Too Many Requests) response for a key, the key manager calls mark_failure on that key entry. The key is blocked from selection for rate_limit_cooldown seconds (configured per provider). During this window, MonoRelay selects the next available key.
config.yml
Tool calling auto-downgrade
Some models do not support thetools / tool_choice fields in the OpenAI request format. The tool_calling block lets you specify glob patterns for such models; MonoRelay will strip tool-related fields from the request before forwarding.
config.yml
When
true, tool parameters are automatically removed for any model matching an entry in unsupported_models.List of fnmatch glob patterns. Any resolved model matching a pattern will have
tools and tool_choice stripped from the outbound request.Auto-downgrade applies to the final resolved model name (after alias and routing). If a model is routed via an alias, the pattern must match the target name, not the alias.