Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/HugoX2003/nisira-assistant/llms.txt

Use this file to discover all available pages before exploring further.

NISIRA Assistant’s RAG pipeline supports three LLM providers — Google Gemini, OpenRouter, and Groq — and lets you switch between them with a single environment variable. All provider configuration lives in backend/rag_system/config.py under the RAG_CONFIG['generation'] dictionary, with every sensitive value read from environment variables at startup. The generation stage wraps retrieved context into a structured academic-assistant prompt before calling the configured provider.

Provider Selection

Set the LLM_PROVIDER environment variable to activate one of the three backends:
ValueProviderNotes
openrouterOpenRouterDefault. Aggregates hundreds of models behind a single API.
googleGoogle GeminiDirect Google AI API; free tier available.
groqGroqUltra-low-latency inference for supported open models.
LLM_PROVIDER=openrouter   # or: google | groq
Only the provider matching LLM_PROVIDER needs its API key set. The other keys can be omitted without affecting startup, but the system will raise a runtime error if a query is attempted while the active provider’s key is missing.

System Prompt and Generation Parameters

Regardless of which provider is active, every RAG query uses the same generation settings defined in RAG_CONFIG['generation']:
ParameterValueDescription
temperature0.4Controls response creativity. 0.4 balances factual accuracy with natural language flow for an academic context.
max_response_tokens1500Maximum tokens in a single RAG answer. Keeps responses concise while allowing thorough explanations.
System prompt personaAcademic assistantResponds in Spanish, uses a friendly-but-professional tone, cites sources inline, and declines to answer when no relevant context is found.
The system prompt is an academic assistant persona. It instructs the model to respond always in Spanish, develop ideas conversationally (like a professor explaining to a student), include inline citations from retrieved documents, and honestly state when it has no relevant information. The prompt template receives {context} (retrieved chunks) and {question} (the user query) as runtime variables.

Provider Configuration

Google Gemini is accessed via the Google AI API. On the free tier, requests are capped at 15 requests per minute; plan accordingly for concurrent users.Required variables
VariableDefaultDescription
GOOGLE_API_KEYRequired. Your Google AI Studio API key.
LLM_MODEL_GEMINIgemini-2.0-flash-expGemini model name. Also used by the Gemini embedding model.
LLM_MAX_TOKENS8192Maximum tokens the model may generate at the API level. RAG responses are further capped at 1500 by max_response_tokens.
The model-level temperature for Gemini is set to None in API_CONFIG['gemini'], which means it defers to the API default. The effective generation temperature (0.4) is applied at the RAG prompt layer, not the model config layer, to prevent accidental divergence.
# backend/.env
LLM_PROVIDER=google
GOOGLE_API_KEY=your-google-ai-studio-key
LLM_MODEL_GEMINI=gemini-2.0-flash-exp
LLM_MAX_TOKENS=8192
The free tier of the Google AI API enforces a 15 requests/minute rate limit. If you expect higher traffic, upgrade to a paid tier or add request-queuing logic in your deployment.

Switching Providers at Runtime

Because all provider config is driven by environment variables, switching providers requires only a change to .env (and a server restart in development, or a re-deploy in production):
# Switch from OpenRouter to Groq
LLM_PROVIDER=groq
GROQ_API_KEY=gsk_your-groq-key
LLM_MODEL_GROQ=llama-3.3-70b-versatile
No code changes are needed. The RAG_CONFIG['generation']['provider'] key is read once at import time from os.getenv("LLM_PROVIDER", "openrouter").

Build docs developers (and LLMs) love