Use this file to discover all available pages before exploring further.
NISIRA Assistant is organized as a three-tier system: a React single-page application that users interact with directly, a Django REST API that handles authentication, persistence, and orchestration, and a self-contained RAG sub-package that owns every aspect of document processing and intelligent retrieval. Each tier is independently deployable and communicates exclusively through HTTP and Python function calls — there are no shared in-memory state or coupling between the frontend and the RAG internals.
The following numbered sequence traces a single user chat message from the browser to the delivered response:
Authentication — The React frontend calls POST /api/auth/token/ with username and password. The server returns a short-lived JWT access token (24 hours) and a refresh token (7 days). Both are stored in localStorage and attached automatically as Authorization: Bearer <token> by the frontend/src/services/api.js interceptor.
Message submission — The user types a question and the frontend POSTs to /api/chat/ with { "content": "...", "conversation_id": "<slug or null>" }. If conversation_id is null, a new Conversation record (with a random URL-safe slug) is created.
Persistence — The Django view persists the incoming user message to the Message model and logs query metadata to QueryMetrics for later analysis.
Query reformulation — RAGPipeline.query() checks the recent conversation history. If the question is short, referential (“And how do you configure it?”), or contains demonstrative pronouns without a clear topic, the pipeline calls the LLM with a lightweight reformulation prompt to produce a self-contained search query before retrieval.
Embedding — EmbeddingManager encodes the (possibly reformulated) query into a 768-dimensional vector using sentence-transformers/all-mpnet-base-v2 running on CPU.
Hybrid search — The pipeline runs four complementary retrieval strategies concurrently and merges them:
Semantic search (weight 0.6): cosine similarity against all stored chunk embeddings.
Lexical search (weight 0.4): keyword frequency scoring, either via PostgreSQL full-text search (search_lexical) or a fallback ChromaDB scan.
Metadata search: filename and document-field keyword matching, with a high boost for exact source-name matches.
Expansion search: term-expansion fallback that kicks in when fewer than top_k results have been found.
Topic filtering — If the query contains recognized identifiers (ISO codes, law numbers, NTP standards, COBIT versions, etc.), chunks from unrelated documents are removed before the context is assembled.
Context assembly — The top-k chunks (3 for citation queries, 5 for general queries) are formatted as Fuente: <filename>\n<content> blocks and joined into a single context string.
LLM generation — The assembled context, conversation history, and original question are injected into the system prompt and sent to the configured LLM. When metrics collection is active, the pipeline uses streaming to capture time-to-first-token (TTFT) in milliseconds.
Response persistence — The bot’s answer and the source list (file_name, page, chunk_id, similarity_score, preview) are stored as a Message record with sources_json.
Frontend rendering — The React chat component renders the response as Markdown, displays source badges, and enables deep-linking to the exact page in the original document through an inline PDF viewer powered by the /api/documents/<slug>/ binary serving endpoint.
Django application containing all REST views, serializers, models, URL routing, and the admin panel views. Registers under /api/ in the root URLconf.
rag_system/
Self-contained RAG sub-package. Never imports from api/; the API layer calls it as a library. Contains its own config.py as a single source of truth for all tuneable parameters.
The rag_system/ package is organized into focused sub-modules, each with a single responsibility:
Sub-package / File
Role
rag_engine/pipeline.py
RAGPipeline class — orchestrates the full retrieve→generate flow, hybrid search strategies, query reformulation, topic filtering, and LLM streaming.
document_processing/pdf_processor.py
PDF parsing with PyPDF2 + pdfplumber; extracts text and page_number metadata per chunk.
document_processing/text_processor.py
DOCX, PPTX, XLSX, and TXT parsing with format-specific chunking strategies.
embeddings/embedding_manager.py
Wraps sentence-transformers/all-mpnet-base-v2; provides single and batch embedding creation with an in-process cache.
vector_store/chroma_manager.py
ChromaDB client — creates and manages the rag_documents collection with cosine distance; used by default in local development.
vector_store/postgres_store.py
PostgreSQL pgvector client — provides search_similar, search_lexical, search_by_metadata, and get_all_documents for production deployments.
drive_sync/drive_manager.py
Google Drive polling: authenticates via service account or OAuth2, lists files in the configured folder, downloads changed files, and triggers ingestion.
config.py
Central configuration dictionary for all chunking parameters, embedding settings, LLM provider details, vector store backend selection, and retrieval weights.
The pipeline selects its vector backend at startup by reading the VECTOR_STORE_BACKEND environment variable and the presence of DATABASE_URL:
VECTOR_STORE_BACKEND=postgres → PostgresVectorStore (production)VECTOR_STORE_BACKEND=chroma → ChromaManager (local dev)postgres + no DATABASE_URL → falls back to ChromaDB automatically
ChromaDB stores embeddings in a local directory (backend/chroma_db/) mounted as a Docker volume. It requires no additional infrastructure and is the default for docker-compose.yml.PostgreSQL pgvector stores embeddings as vector(768) columns alongside full-text-search indexes. This backend enables native search_lexical queries using PostgreSQL’s ts_vector/ts_query operators, eliminating the need for the Python-side fallback lexical scan. It is the default for docker-compose.production.yml.The RAGPipeline accesses both backends through a unified interface — search_similar, add_documents, list_all_documents, reset_collection — so the rest of the pipeline code is backend-agnostic.
The active provider is selected by the LLM_PROVIDER environment variable. Changing it requires only an .env update and a container restart.
Provider key
Default model
Required env var
Notes
google
gemini-2.0-flash-exp
GOOGLE_API_KEY
Uses langchain-google-genai; supports streaming for TTFT measurement.
openrouter
google/gemma-2-9b-it
OPENROUTER_API_KEY
Uses langchain-openai pointed at https://openrouter.ai/api/v1. Model overridable via LLM_MODEL_OPENROUTER.
groq
llama-3.3-70b-versatile
GROQ_API_KEY
Uses langchain-groq; fast inference for latency-sensitive deployments. Model overridable via LLM_MODEL_GROQ.
All providers share the same generation settings: temperature 0.4, max response tokens 1,500, and a structured Spanish-language system prompt that instructs the LLM to answer naturally, cite sources inline, and acknowledge when information is not available.
NISIRA Assistant uses Django’s ORM and dj-database-url to support two database backends without any code changes:
Environment
Engine
Compose file
Vector store
Local development
MySQL 8.0
docker-compose.yml
ChromaDB
Production
PostgreSQL + pgvector
docker-compose.production.yml
pgvector
core/settings.py reads DATABASE_URL (or the individual DB_* variables) at startup and configures Django’s DATABASES dict accordingly. The psycopg2-binary and mysql-connector-python drivers are both present in requirements.txt so either engine works without reinstalling dependencies.
NISIRA Assistant uses JSON Web Tokens via djangorestframework-simplejwt:
Login — POST /api/auth/login/ (custom view) or POST /api/auth/token/ (standard simplejwt view). Returns { "access": "...", "refresh": "..." }.
Token storage — The frontend stores both tokens in localStorage and reads the user payload from the access token to determine role.
Authenticated requests — Every protected API call includes Authorization: Bearer <access_token>. Django middleware validates the signature and expiry.
Token refresh — When the access token expires, the frontend calls POST /api/auth/refresh/ with the refresh token to receive a new access token.
Role-based routing — The React ProtectedRoute component reads localStorage for the token and checks whether user.username === 'admin'. Admin users can access /admin and /admin/:tabId; regular users are redirected to /chat.
The React Admin Panel communicates with a dedicated set of admin-only endpoints under /api/admin/. These require a valid admin JWT token and are not accessible to regular users.
Endpoint group
Endpoints
Google Drive
GET /api/admin/drive/files/, POST /api/admin/drive/upload/, DELETE /api/admin/drive/delete/<file_id>/, POST /api/admin/drive/sync/, GET /api/admin/drive/sync/progress/
Embeddings
GET /api/admin/embeddings/status/, POST /api/admin/embeddings/generate/, POST /api/admin/embeddings/verify/, POST /api/admin/embeddings/clear/, GET /api/admin/embeddings/progress/, GET /api/admin/embeddings/processed/, DELETE /api/admin/embeddings/delete/<file_name>/
Metrics
GET /api/admin/metrics/, GET /api/admin/metrics/queries/, GET /api/admin/metrics/queries/<query_id>/, GET /api/admin/metrics/ratings/
Pipeline
GET /api/admin/pipeline/status/
RAG Pipeline
Detailed documentation of hybrid search, adaptive top-k, topic filtering, and query reformulation.
LLM Providers
How to configure Gemini, OpenRouter, and Groq — including model overrides and temperature tuning.
Docker Deployment
Production setup with PostgreSQL, pgvector, Nginx reverse proxy, and SSL via Certbot.
Environment Variables
Full reference for every backend and frontend environment variable.