The RAG (Retrieval-Augmented Generation) System API exposes low-level control over the NISIRA pipeline: checking readiness, triggering document synchronization from Google Drive, and running queries directly against the vector store with fine-grained parameter control. These endpoints are distinct fromDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/HugoX2003/nisira-assistant/llms.txt
Use this file to discover all available pages before exploring further.
POST /api/chat/ — they bypass conversation persistence and are better suited for testing, evaluation scripts, and the admin panel. Unless noted, all endpoints require a valid JWT Bearer token.
GET /api/rag/status/
Returns the current readiness state of the RAG pipeline by instantiating aRAGPipeline object and calling is_ready() on each of its components. Requires JWT authentication.
Response
true if the RAG Python modules were successfully imported at server start. false means the dependencies are missing and a 503 is returned instead.Component-level readiness map.
ISO 8601 timestamp of when the status was checked.
Example
POST /api/rag/initialize/
Initializes theRAGPipeline instance and verifies that all components are operational. This is typically called once after deployment to warm up the pipeline before the first user query. Requires JWT authentication.
Response
Human-readable confirmation, e.g.,
"Sistema RAG inicializado correctamente".Example
POST /api/rag/sync/
Triggers a document synchronization pass viapipeline.sync_and_process_documents(). This downloads new or changed files from Google Drive (or local storage), chunks them, generates embeddings, and upserts the vectors into the configured store. Accepts an optional force_reprocess flag to re-embed files that are already indexed. Requires JWT authentication.
Request Body
When
true, re-processes and re-indexes all documents even if they have already been embedded (identified by MD5 hash). Use this after updating the embedding model or chunking strategy.Response
Confirmation string on success.
Sync results object returned by the pipeline.
Example
POST /api/rag/query/
Runs a direct RAG query outside of any conversation context. Useful for evaluation scripts, batch testing, and the admin panel. Thetop_k parameter can be set explicitly to override the adaptive calculation used by POST /api/chat/.
This endpoint has
AllowAny permission in the source code — a JWT token is not enforced at the Django permission layer. However, best practice is to always supply the Authorization header in production environments.Request Body
The question to answer. Must be a non-empty string.
Number of document chunks to retrieve. Overrides the adaptive calculation. Clamped to the range [3, 15]. Omit to use
calculate_adaptive_top_k() automatically.When
true, passes the retrieved context to the LLM and returns a natural-language answer. When false, returns only the retrieved chunks without generating a response.Response
Echo of the submitted question.
LLM-generated answer (only present when
include_generation is true).Array of source citation objects (same structure as
POST /api/chat/ sources).Number of document chunks that were retrieved from the vector store.
Whether the LLM generation step was executed.
ISO 8601 timestamp of the query.
Example
- cURL
- Response
POST /api/rag/chat/
An enhanced chat endpoint that integrates the full RAG pipeline with automatic conversation history support. UnlikePOST /api/chat/, this endpoint retrieves the 6 most recent messages from the conversation and passes them as context to the LLM, producing richer grounded answers. It auto-creates a new conversation if conversation_id is omitted, and falls back gracefully to the basic chat handler if the RAG modules are unavailable. Returns 201 Created on success. Requires JWT authentication.
For most frontend integrations,
POST /api/rag/chat/ is the recommended endpoint. It provides richer metrics tracking (via MetricsTracker) compared to POST /api/chat/, which uses a simpler response generation path.Request Body
The user’s message. Must be non-empty.
Slug or legacy numeric ID of an existing conversation. If omitted, a new conversation is created automatically with the message content as the title (truncated to 50 characters).
Set to
false to bypass the RAG pipeline and use the basic keyword-based response generator. Automatically set to false for short greetings (30 characters or fewer) to avoid unnecessary retrieval.Conversation history is built automatically by the backend — the 6 most recent messages from the conversation are retrieved from the database and passed to the LLM as context. There is no
history request body parameter.Response
Slug of the conversation (new or existing).
The saved user message record:
{id, content, timestamp}.The saved assistant message record:
{id, content, timestamp, rating, rating_issue_tag}.The assistant’s answer text (duplicated from
assistant_message.content for frontend compatibility).Whether the RAG pipeline was actually invoked for this response.
Source citations for the response. Empty array when
rag_used is false.Performance metrics summary from
MetricsTracker: includes latency breakdown, top-k used, and RAGAS scores when available.Example
GET /api/rag/system-status/
Returns a detailed diagnostic snapshot of the RAG pipeline by callingpipeline.get_system_status(). This is a more verbose version of GET /api/rag/status/ that exposes internal component metadata.
This endpoint has
AllowAny permission — a JWT token is not enforced at the Django permission layer. Best practice is to supply the Authorization header in production environments.Response
Whether the RAG modules are importable.
Detailed component status object returned directly by
pipeline.get_system_status(). Contents vary by deployment configuration but typically include vector store connection details, embedding model info, and document counts.ISO 8601 timestamp of the status check.