The system supports two interchangeable LLM backends: OpenRouter (cloud, recommended for production) and Ollama (local, free, no API key required). Both implement the sameDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/gcapella0/agente-inteligente-expedientes/llms.txt
Use this file to discover all available pages before exploring further.
BaseLlmProvider interface, so ClassifierAgent works identically regardless of which one is active.
LLM configuration is persisted in MongoDB (sistema_config collection, document _id: "llm_config"). This means you can switch providers at runtime via PUT /config/llm without restarting the service — the factory reads MongoDB first and falls back to .env only if no database config exists.
OpenRouter
OpenRouter is the recommended provider for production. It routes your requests to hosted models with no local GPU required, and the system automatically rotates through fallback models if rate limits are hit.
Setup
- Create an account at openrouter.ai and generate an API key.
- Set the following variables in your
.env:
Automatic model rotation
OpenRouterProvider builds a deduplicated model list at startup: the primary model (OPENROUTER_MODEL) first, followed by each entry in OPENROUTER_FALLBACK_MODELS. When a classification call is made:
- It tries the primary model first, up to 3 retries for transient errors (connection timeouts, non-JSON responses).
- If a
RateLimitErroris received, it immediately rotates to the next model in the list — no retry on the rate-limited model. - The cycle repeats through all models. If all are exhausted, the agent returns a structured error result with
"valido": falserather than crashing the pipeline.
.env:
Testing connectivity
Use the built-in health endpoint to verify your API key and model are reachable before processing documents:OpenRouterProvider.health_check(), which lists available models from the API and returns:
Ollama
Ollama is ideal for local development: no API key, no cost, and no data leaves your machine. Performance depends entirely on your hardware — a modern CPU can classify documents in 30–60 seconds with the right model.
Setup
- Install Ollama: ollama.com/download
- Pull the model you want to use:
- Set the following variables in your
.env:
Model recommendations
Choose your Ollama model based on available RAM and acceptable classification latency:| Model | RAM approx. | Estimated time (i5 CPU) | Quality |
|---|---|---|---|
phi3:mini | ~2.2 GB | 30–60 s | Good |
qwen2.5:0.5b | ~0.8 GB | 10–20 s | Basic |
gemma4:e4b | ~4B params quantized | Recommended for server | Best quality |
How OllamaProvider works
OllamaProvider calls Ollama’s POST /api/chat endpoint (not /api/generate) with stream: false. This allows sending the system prompt and user message as separate roles, which improves classification accuracy and reduces context length.
Key behaviour:
- OCR text is truncated to 1500 characters before sending to Ollama. This keeps CPU inference fast while preserving the most relevant content from the document.
- The context window is limited to 2048 tokens (
num_ctx). - If the response is not valid JSON, the provider retries once before returning an error result.
- On a timeout, the log message recommends switching to a smaller model.
Runtime switching (no restart required)
Thecreate_llm_provider() factory in src/services/llm/llm_factory.py implements a MongoDB-first lookup:
- It queries
sistema_config.find_one({"_id": "llm_config"}). - If the document exists, it overrides
config.LLM_PROVIDER,config.OLLAMA_MODEL, etc. in memory. - It then instantiates the appropriate provider class.
- If MongoDB is unavailable, it silently falls back to the values loaded from
.env.
Test before saving
You can test a provider configuration without persisting it by passing optionalprovider and host fields to the probe endpoint:
health_check() against the specified provider and returns a status without modifying the stored config.
Docker + Ollama: If the service is running inside Docker and Ollama is running on the host machine, you must explicitly set
OLLAMA_BASE_URL=http://host.docker.internal:11434 in your .env file — that is the variable OllamaProvider reads from src/config.py. Note that docker-compose.yml includes OLLAMA_HOST: http://host.docker.internal:11434 in its environment block, but OLLAMA_HOST is not read by the application code and has no effect on routing. You must update OLLAMA_BASE_URL in .env (or use PUT /config/llm at runtime) to point the provider at the host machine.