Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Excurs1ons/MonoRelay/llms.txt

Use this file to discover all available pages before exploring further.

MonoRelay’s model router translates the model name in each incoming request into a concrete (model, provider) pair before the request is forwarded upstream. The translation pipeline runs in a fixed order: alias expansion first, then provider mapping, then model overrides, then complexity scoring, and finally provider auto-detection. Understanding this order helps you predict which upstream endpoint will handle a given request.

Routing resolution order

When a request arrives with a model name, the router applies each step in sequence and stops as soon as a provider is determined:
  1. Provider suffix — if the model string contains @ (e.g., gpt-4o@openrouter), the suffix is used directly and remaining steps are skipped.
  2. Alias — the model name is looked up in aliases. The result may itself be an alias; resolution loops until no further alias is found.
  3. Provider mapping — the resolved name is matched against provider_mapping fnmatch patterns. The first match determines the provider.
  4. Model overrides — if still unresolved, model_overrides fnmatch patterns can rename the model string.
  5. Complexity routing — if enabled, message content is scored and routed to simple, moderate, or complex model targets.
  6. Provider auto-detection — the router scans each enabled provider’s models.include list for an exact or normalized match. The first provider whose include list is empty (meaning it accepts any model) also acts as a catch-all.

Provider suffix syntax

Append @providername to any model string to force a specific provider, bypassing alias and mapping rules.
gpt-4o@openrouter
claude-3-5-sonnet@anthropic
llama-3.1-70b@nvidia
Provider names are matched case-insensitively and with separators stripped, so OpenRouter, openrouter, and open_router all resolve to the same provider.

Model aliases

Aliases let you expose short, stable names to clients while keeping the actual model target flexible. Values can themselves be aliases, enabling chains.
config.yml
model_routing:
  aliases:
    "fast":         "openai/gpt-4o-mini"
    "balanced":     "openai/gpt-4o"
    "smart":        "anthropic/claude-sonnet-4-20250514"
    "nvidia-fast":  "meta/llama-4-maverick-17b-128e-instruct"
    "nvidia-smart": "nvidia/llama-3.1-nemotron-ultra-253b-v1"
A client requesting model: "fast" will have it resolved to openai/gpt-4o-mini before any further routing step runs.
Aliases are matched with separator-normalized comparison, so fast, Fast, and FAST all match the same alias entry.

Provider mapping

provider_mapping maps fnmatch glob patterns to a provider name. This is the primary mechanism for directing model families to the right upstream without listing every model explicitly.
config.yml
model_routing:
  provider_mapping:
    "gpt-*":      "openrouter"
    "o1*":        "openrouter"
    "o3*":        "openrouter"
    "claude-*":   "openrouter"
    "gemini-*":   "openrouter"
    "llama-*":    "nvidia"
    "mistral-*":  "openrouter"
    "deepseek-*": "openrouter"
    "meta/*":     "nvidia"
    "nvidia/*":   "nvidia"
Patterns are matched case-insensitively in definition order. The first matching pattern wins. If the mapped provider is disabled, the mapping is skipped and matching continues.
fnmatch patterns support * (any sequence of characters) and ? (single character). Patterns like "meta/*" match the literal slash in model names such as meta/llama-4-scout.

Model overrides

model_overrides renames the model string before it reaches the upstream, also using fnmatch patterns. Use this to canonicalize model names or silently redirect deprecated aliases.
config.yml
model_routing:
  model_overrides:
    "gpt-4-turbo": "gpt-4o"
    "claude-3-opus*": "anthropic/claude-opus-4-20250514"

Complexity routing

When enabled, MonoRelay scores the content of the request messages and routes to one of three model tiers. This lets a single endpoint automatically use a cheaper model for simple requests and a stronger one for difficult tasks.
config.yml
model_routing:
  complexity:
    enabled: false
    simple:   "openai/gpt-4o-mini"
    moderate: "openai/gpt-4o"
    complex:  "anthropic/claude-sonnet-4-20250514"
The scorer counts keywords from three categories — reasoning terms, code terms, and simple/greeting terms — and computes a score in the range [-1.0, 1.0]. Routing thresholds:
ScoreTarget
< 0simple
0 to < 0.35moderate
≥ 0.35complex
A token count exceeding 50 000 (approximately 200 000 characters) always routes to complex.

Cascade fallback

Cascade tries each model in a prioritized list, moving to the next entry if the current one fails. This provides resilience when upstream services are degraded.
config.yml
model_routing:
  cascade:
    enabled: false
    max_retries: 2
    models:
      - "openai/gpt-4o-mini"
      - "openai/gpt-4o"
      - "anthropic/claude-sonnet-4-20250514"
Each model in the list is resolved through the standard alias and provider-mapping pipeline before the request is attempted.
Cascade routing increases end-to-end latency for failed attempts. Set max_retries conservatively and use providers with low retry cost (e.g., ones with generous free tiers) near the top of the list.

Payload transformation

Transformation rules can inject or override request body parameters for requests matching specific model patterns.
config.yml
model_routing:
  payload_transformation:
    enabled: true
    rules:
      # Inject metadata for all gpt-4o variants
      - models: ["gpt-4o-*"]
        inject_params:
          metadata:
            source: "monorelay"
        override_params:
          temperature: 0.7

      # Force extended thinking for Claude models
      - models: ["claude-*"]
        override_params:
          thinking:
            type: "enabled"
            budget_tokens: 1024
  • inject_params — adds keys only if they are not already present in the request body.
  • override_params — always sets the key, replacing any existing value. Supports dot-notation for nested keys (e.g., "generationConfig.thinkingBudget").

Global params

global_params applies a default set of parameters and an optional system prompt to every request.
config.yml
model_routing:
  global_params:
    enabled: true
    mode: "default"   # "default" (fill missing) or "override" (always replace)
    params:
      temperature: 0.8
      max_tokens: 4096
    system_prompt: "You are a helpful assistant."
ModeBehavior
defaultApplies params only when the key is absent in the request; prepends system_prompt to an existing system message
overrideAlways sets params and replaces any existing system message

Build docs developers (and LLMs) love