The Model Router lets you configure rules that automatically direct each incoming chat message to the most appropriate LLM provider and model—without the user lifting a finger. Instead of locking a workspace to a single model, you define a prioritized ruleset: simple factual queries go to a fast, cheap model while complex multi-step reasoning escalates to GPT-4o or Claude. The router evaluates every rule in priority order on each new message, picks the first match, and then sticks with that model for a configurable cooldown window so follow-up messages in the same conversation stay consistent.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Mintplex-Labs/anything-llm/llms.txt
Use this file to discover all available pages before exploring further.
How It Works
Every time a chat message arrives in a router-enabled workspace, theModelRouterService runs a two-phase evaluation:
- Calculated rules — deterministic conditions evaluated locally in milliseconds (no LLM call required). These check properties of the current conversation context such as prompt content, token count, message count, current hour, or whether the message contains an image attachment.
- LLM rules — a natural-language description of when the rule should fire. AnythingLLM sends a lightweight classification request to the fallback model asking which rule (if any) best matches the user’s prompt. Results are cached for the full sticky window so you’re never charged for repeated classification calls on the same topic.
Sticky Routing
Once a rule fires, the resolved model “sticks” for a configurable cooldown period (default: 300 seconds / 5 minutes). Every follow-up message in the same user–workspace–thread context reuses the same model without re-running rules. The TTL resets on each new message, so an active conversation keeps the same model “hot.” When the window expires and no new rule matches, the router falls back to the default model.The Model Router is a multi-user / Docker feature. You must have at least one LLM provider configured in System Settings. Rules can route to any provider/model combination, including different models on the same provider.
Use Cases
Cost Optimization
Route quick factual or greeting messages to a cheap, fast model (e.g.,
gpt-4o-mini or gemini-2.0-flash-lite) and reserve expensive frontier models for complex reasoning tasks.Capability Matching
Automatically escalate to a vision-capable model when an image attachment is present, or switch to a code-specialized model when the prompt contains programming keywords.
Time-Based Routing
Send after-hours traffic to a local Ollama model to avoid cloud API costs during low-priority windows, then route back to a cloud model during business hours. Rules use the server’s local clock when evaluating
currentHour.Long-Context Handling
Detect when conversation token counts exceed a threshold and automatically switch to a model with a larger context window before the conversation gets truncated.
Setting Up a Model Router
Open the Model Router settings
Navigate to System Settings → Model Router in the AnythingLLM admin panel. Click New Router to open the creation form.
Name your router and set the fallback model
Give the router a descriptive name (up to 255 characters). Select the fallback provider and fallback model that will be used whenever no rule matches. This is also the model used for LLM-type rule classification.Optionally set a cooldown (seconds) between 0 and 3600. This controls how long a matched route stays “sticky” before the router re-evaluates rules. The default is 300 seconds (5 minutes).
Add routing rules
Click Add Rule and configure each rule (see Rule Types below). Rules are evaluated in ascending priority order—lower numbers run first.
Rule Types
The router supports two rule types, controlled by thetype field: calculated and llm.
Calculated Rules
Calculated rules evaluate deterministic conditions against the live conversation context. They run instantly on every message at zero extra cost. A rule can have multiple conditions combined withAND logic.
Available context properties:
| Property | Type | Description |
|---|---|---|
promptContent | string | The full text of the user’s current message |
conversationTokenCount | number | Estimated token count for the full context (system prompt + history + current message) |
conversationMessageCount | number | Total number of messages in the conversation so far (including the current one) |
currentHour | number | The current server local hour (0–23) |
hasImageAttachment | string | "true" if the message includes an image attachment, otherwise "false" |
| Comparator | Applies to | Description |
|---|---|---|
contains | string | The string value contains any of the comma-separated keywords (case-insensitive) |
matches | string | The string value matches a regex pattern (e.g., /code|function|bug/i) |
eq / neq | string, number | Equals / not equals |
gt / gte | number | Greater than / greater than or equal |
lt / lte | number | Less than / less than or equal |
between | number | Value is between two comma-separated numbers (inclusive) |
LLM Rules
LLM rules use your fallback model to classify the user’s prompt against a plain-English description. Write a natural-language description of what kinds of messages should trigger this rule—the model decides at runtime. Classification results are cached for the full sticky window (matching results) or a 30-second cooldown (no-match results) to avoid redundant API calls. Example: Route research questions to a web-browsing capable modelRule Priority and Evaluation Order
Rules are evaluated in ascending priority order. The first rule that matches wins—remaining rules are skipped. Assign lower priority numbers to the most specific or highest-importance rules.Fallback Model
The fallback provider and model are used in two scenarios:- No rule matched and the sticky window has expired (or was never set).
- LLM rules need to classify the prompt—the classification itself always uses the fallback model.
Routing Metadata in Chat
When the router switches models mid-conversation, the AnythingLLM UI displays a small routing notification in the chat so users know which model responded. Notifications fire when:- A rule matches for the first time in a conversation.
- The model changes from one turn to the next.
- The conversation falls back from a rule-matched model to the fallback model.