NISIRA Assistant’s RAG pipeline supports three LLM providers — Google Gemini, OpenRouter, and Groq — and lets you switch between them with a single environment variable. All provider configuration lives inDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/HugoX2003/nisira-assistant/llms.txt
Use this file to discover all available pages before exploring further.
backend/rag_system/config.py under the RAG_CONFIG['generation'] dictionary, with every sensitive value read from environment variables at startup. The generation stage wraps retrieved context into a structured academic-assistant prompt before calling the configured provider.
Provider Selection
Set theLLM_PROVIDER environment variable to activate one of the three backends:
| Value | Provider | Notes |
|---|---|---|
openrouter | OpenRouter | Default. Aggregates hundreds of models behind a single API. |
google | Google Gemini | Direct Google AI API; free tier available. |
groq | Groq | Ultra-low-latency inference for supported open models. |
Only the provider matching
LLM_PROVIDER needs its API key set. The other
keys can be omitted without affecting startup, but the system will raise a
runtime error if a query is attempted while the active provider’s key is
missing.System Prompt and Generation Parameters
Regardless of which provider is active, every RAG query uses the same generation settings defined inRAG_CONFIG['generation']:
| Parameter | Value | Description |
|---|---|---|
temperature | 0.4 | Controls response creativity. 0.4 balances factual accuracy with natural language flow for an academic context. |
max_response_tokens | 1500 | Maximum tokens in a single RAG answer. Keeps responses concise while allowing thorough explanations. |
| System prompt persona | Academic assistant | Responds in Spanish, uses a friendly-but-professional tone, cites sources inline, and declines to answer when no relevant context is found. |
The system prompt is an academic assistant persona. It instructs the model
to respond always in Spanish, develop ideas conversationally (like a
professor explaining to a student), include inline citations from retrieved
documents, and honestly state when it has no relevant information. The prompt
template receives
{context} (retrieved chunks) and {question} (the user
query) as runtime variables.Provider Configuration
- Google Gemini
- OpenRouter
- Groq
Google Gemini is accessed via the Google AI API. On the free tier, requests are
capped at 15 requests per minute; plan accordingly for concurrent users.Required variables
The model-level
| Variable | Default | Description |
|---|---|---|
GOOGLE_API_KEY | — | Required. Your Google AI Studio API key. |
LLM_MODEL_GEMINI | gemini-2.0-flash-exp | Gemini model name. Also used by the Gemini embedding model. |
LLM_MAX_TOKENS | 8192 | Maximum tokens the model may generate at the API level. RAG responses are further capped at 1500 by max_response_tokens. |
temperature for Gemini is set to None in
API_CONFIG['gemini'], which means it defers to the API default. The
effective generation temperature (0.4) is applied at the RAG prompt layer,
not the model config layer, to prevent accidental divergence.Switching Providers at Runtime
Because all provider config is driven by environment variables, switching providers requires only a change to.env (and a server restart in development, or a re-deploy in production):
RAG_CONFIG['generation']['provider'] key is read once at import time from os.getenv("LLM_PROVIDER", "openrouter").