Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/HugoX2003/nisira-assistant/llms.txt

Use this file to discover all available pages before exploring further.

NISIRA Assistant’s retrieval-augmented generation pipeline is configured through a single source of truth: backend/rag_system/config.py. This file defines four top-level configuration dictionaries — RAG_CONFIG, DOCUMENT_PROCESSING_CONFIG, API_CONFIG, and CHROMA_CONFIG — along with VECTOR_STORE_CONFIG for the storage backend. Most sensitive or deployment-specific values are read from environment variables; the rest are tuned constants you can edit directly in config.py. This page documents every setting and explains how each one affects retrieval quality, indexing behavior, and generation output.

Retrieval Settings

All retrieval tuning lives under RAG_CONFIG['retrieval']. These values control how many document chunks are fetched per query, how they are scored, and how the final context window is assembled.
SettingValueDescription
top_k15Maximum number of candidate chunks retrieved from the vector store per query. The pipeline may return fewer chunks after diversity filtering and minimum-score enforcement.
similarity_threshold0.005Minimum cosine similarity score for a chunk to pass the initial vector-search filter. Set intentionally low to maximise recall; the re-ranking and min_score_threshold steps handle precision downstream.
rerankTrueWhen True, retrieved chunks are re-scored after the initial vector search using a combined semantic + lexical signal before being passed to the LLM.
max_context_length12000Hard cap (in characters) on the total context injected into the LLM prompt. Prevents token-limit overflows on models with smaller context windows.
diversity_threshold0.4Controls source diversity. Chunks from the same document that are too similar to an already-selected chunk are penalised, encouraging the context window to draw from multiple sources.
max_per_source3Maximum number of chunks from any single source document included in a single response. Works in conjunction with diversity_threshold to avoid over-indexing on one file.
min_score_threshold0.05Post-rerank minimum score for a chunk to be cited as a source in the final answer. Chunks below this threshold are dropped from the source list even if they appear in the context.
semantic_weight0.6Weight given to vector (semantic) similarity in the hybrid scoring formula.
lexical_weight0.4Weight given to keyword (lexical / BM25-style) similarity in the hybrid scoring formula.
The hybrid scoring formula combines semantic and lexical signals: final_score = (semantic_weight × semantic_score) + (lexical_weight × lexical_score). Adjusting the weights shifts the balance between conceptual relevance and exact-keyword matching. The defaults (60 / 40) work well for academic documents that mix prose explanations with technical terminology.

Generation Settings

Generation parameters live under RAG_CONFIG['generation'] and apply to all three LLM providers.
SettingValueDescription
temperature0.4Sampling temperature for the LLM. Lower values produce more deterministic, factual answers; higher values increase creativity. 0.4 is chosen to balance natural language flow with academic accuracy.
max_response_tokens1500Token budget for each generated answer. Long enough for thorough explanations; short enough to keep responses focused and within rate-limit budgets.
providerenv LLM_PROVIDERActive LLM backend, resolved from the LLM_PROVIDER environment variable (default: openrouter).
The system prompt instructs the model to act as a friendly academic assistant that:
  • Responds always in Spanish, translating English source material naturally.
  • Develops ideas conversationally, explaining why and what for, not just what.
  • Includes inline citations: Según el documento, ”…” (source.pdf).
  • Refuses to answer when no relevant context is found in the retrieved chunks.
  • Structures answers with a brief introduction, developed main points, supporting citations, and a synthesis where applicable.

Document Chunking

Document chunking settings are defined in DOCUMENT_PROCESSING_CONFIG['chunk_config']. Each supported file format has its own chunk size, overlap, and minimum chunk size, tuned to that format’s typical structure.
PDFs often contain dense academic content with complex layouts. Shorter chunks help isolate individual concepts.
ParameterValueDescription
chunk_size1300Target chunk length in characters.
chunk_overlap260Characters shared between adjacent chunks (20% of chunk size). Prevents context loss at chunk boundaries.
min_chunk_size180Chunks smaller than this are discarded to avoid noisy micro-fragments.
Supported file formats: .pdf, .txt, .docx, .doc, .pptx, .xlsx Additional processing flags (all True by default):
FlagDescription
extract_metadataExtract and store document metadata (title, author, creation date) alongside chunks.
preserve_structureAttempt to maintain heading hierarchy and section boundaries during chunking.
clean_textStrip artefacts (ligatures, excess whitespace, OCR noise) from extracted text before chunking.

Embedding Settings

NISIRA Assistant uses a two-tier embedding strategy: Google’s text-embedding-004 as the primary production model and all-mpnet-base-v2 (local HuggingFace) as the fallback.
SettingEnv VarDefaultDescription
Primary modelGEMINI_EMBEDDING_MODELmodels/text-embedding-004Google’s production embedding model. Used when GOOGLE_API_KEY is set. High-quality 768-dimensional embeddings with no local GPU required.
Fallback modelEMBEDDING_MODELsentence-transformers/all-mpnet-base-v2Local HuggingFace model. Runs on CPU by default; produces 768-dimensional embeddings compatible with the same vector store schema.
Inference deviceEMBEDDING_DEVICEcpuDevice for local embedding inference. Set to cuda if a GPU is available to speed up document indexing.
Both the primary and fallback embedding models produce 768-dimensional vectors. This means you can switch between them without re-indexing your document store or changing the vector store schema.
Additional HuggingFace embedding options (configurable via env vars):
VariableDefaultDescription
EMBEDDING_MAX_SEQ_LENGTH512Maximum token sequence length passed to the local embedding model.
EMBEDDING_NORMALIZEtrueWhen true, embedding vectors are L2-normalised, which is required for cosine similarity to work correctly.

Vector Store

The vector store backend is selected via the VECTOR_STORE_BACKEND environment variable. Both backends store the same 768-dimensional embedding vectors and are drop-in alternatives from the application’s perspective.
SettingEnv VarDefaultDescription
BackendVECTOR_STORE_BACKENDpostgrespostgres uses pgvector on the existing PostgreSQL database. chroma runs a local ChromaDB instance.
Database URLDATABASE_URLPostgreSQL connection URI used by both Django ORM and the pgvector store. Required when VECTOR_STORE_BACKEND=postgres.

PostgreSQL / pgvector (Production)

Using postgres stores vectors inside the same PostgreSQL database as the rest of the application data, eliminating a separate infrastructure dependency. Requires the pgvector extension to be enabled on the database.
VECTOR_STORE_BACKEND=postgres
DATABASE_URL=postgresql://user:password@host:5432/dbname

ChromaDB (Development)

chroma runs an embedded ChromaDB instance that persists data to a local directory. No additional service or extension is required, making it ideal for local development and quick testing.
ChromaDB SettingValueDescription
persist_directorychroma_db/Local directory where ChromaDB persists its data files (relative to backend/).
collection_namerag_documentsName of the ChromaDB collection that holds all indexed document chunks.
distance_functioncosineDistance metric used for nearest-neighbour search. Must match the normalisation applied during embedding.
embedding_dimension768Vector dimension; must match the output of the active embedding model.
VECTOR_STORE_BACKEND=chroma
# No DATABASE_URL needed for local ChromaDB
The development example file (backend/.env.local.example) sets VECTOR_STORE_BACKEND=chroma by default. Switch to postgres before deploying to production so that your vector data is co-located with the rest of your application database and benefits from automated backups.

Build docs developers (and LLMs) love