RAG System API: Pipeline Control and Direct Querying

The RAG (Retrieval-Augmented Generation) System API exposes low-level control over the NISIRA pipeline: checking readiness, triggering document synchronization from Google Drive, and running queries directly against the vector store with fine-grained parameter control. These endpoints are distinct from POST /api/chat/ — they bypass conversation persistence and are better suited for testing, evaluation scripts, and the admin panel. Unless noted, all endpoints require a valid JWT Bearer token.

GET /api/rag/status/

Returns the current readiness state of the RAG pipeline by instantiating a RAGPipeline object and calling is_ready() on each of its components. Requires JWT authentication.

Response

rag_available

boolean

true if the RAG Python modules were successfully imported at server start. false means the dependencies are missing and a 503 is returned instead.

status

object

Component-level readiness map.

Show Status object fields

modules_available

boolean

Whether the rag_system package is importable.

components

object

Dictionary of component names to boolean readiness flags (e.g., {"vector_store": true, "embeddings": true}).

version

string

RAG pipeline version string (e.g., "1.0.0").

timestamp

string

ISO 8601 timestamp of when the status was checked.

Example

curl -X GET https://your-domain.com/api/rag/status/ \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."

{
  "rag_available": true,
  "status": {
    "modules_available": true,
    "components": {
      "vector_store": true,
      "embeddings": true,
      "document_processor": true
    },
    "version": "1.0.0"
  },
  "timestamp": "2024-11-10T14:00:00.000Z"
}

POST /api/rag/initialize/

Initializes the RAGPipeline instance and verifies that all components are operational. This is typically called once after deployment to warm up the pipeline before the first user query. Requires JWT authentication.

Response

message

string

Human-readable confirmation, e.g., "Sistema RAG inicializado correctamente".

result

object

Show Result object fields

success

boolean

true if initialization completed without errors.

components

object

Per-component readiness map returned by pipeline.is_ready().

message

string

Short status string from the pipeline.

Example

curl -X POST https://your-domain.com/api/rag/initialize/ \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."

{
  "message": "Sistema RAG inicializado correctamente",
  "result": {
    "success": true,
    "components": {
      "vector_store": true,
      "embeddings": true,
      "document_processor": true
    },
    "message": "Sistema RAG inicializado"
  }
}

POST /api/rag/sync/

Triggers a document synchronization pass via pipeline.sync_and_process_documents(). This downloads new or changed files from Google Drive (or local storage), chunks them, generates embeddings, and upserts the vectors into the configured store. Accepts an optional force_reprocess flag to re-embed files that are already indexed. Requires JWT authentication.

Request Body

force_reprocess

boolean

default:"false"

When true, re-processes and re-indexes all documents even if they have already been embedded (identified by MD5 hash). Use this after updating the embedding model or chunking strategy.

Response

message

string

Confirmation string on success.

result

object

Sync results object returned by the pipeline.

Show Result object fields

success

boolean

true if sync completed without fatal errors.

downloaded

integer

Number of new files downloaded from Drive.

processed

integer

Number of documents newly embedded.

skipped

integer

Number of files skipped because they were already indexed.

errors

integer

Number of files that failed processing.

Example

curl -X POST https://your-domain.com/api/rag/sync/ \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..." \
  -H "Content-Type: application/json" \
  -d '{"force_reprocess": false}'

{
  "message": "Documentos sincronizados y procesados correctamente",
  "result": {
    "success": true,
    "downloaded": 3,
    "processed": 3,
    "skipped": 47,
    "errors": 0
  }
}

POST /api/rag/query/

Runs a direct RAG query outside of any conversation context. Useful for evaluation scripts, batch testing, and the admin panel. The top_k parameter can be set explicitly to override the adaptive calculation used by POST /api/chat/.

This endpoint has AllowAny permission in the source code — a JWT token is not enforced at the Django permission layer. However, best practice is to always supply the Authorization header in production environments.

Request Body

question

string

required

The question to answer. Must be a non-empty string.

top_k

integer

Number of document chunks to retrieve. Overrides the adaptive calculation. Clamped to the range [3, 15]. Omit to use calculate_adaptive_top_k() automatically.

include_generation

boolean

default:"true"

When true, passes the retrieved context to the LLM and returns a natural-language answer. When false, returns only the retrieved chunks without generating a response.

Response

question

string

Echo of the submitted question.

answer

string

LLM-generated answer (only present when include_generation is true).

sources

array

Array of source citation objects (same structure as POST /api/chat/ sources).

relevant_documents_count

integer

Number of document chunks that were retrieved from the vector store.

generation_used

boolean

Whether the LLM generation step was executed.

timestamp

string

ISO 8601 timestamp of the query.

Example

cURL
Response

curl -X POST https://your-domain.com/api/rag/query/ \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..." \
  -H "Content-Type: application/json" \
  -d '{
    "question": "¿Qué documentos se necesitan para un permiso de funcionamiento?",
    "top_k": 8,
    "include_generation": true
  }'

{
  "question": "¿Qué documentos se necesitan para un permiso de funcionamiento?",
  "answer": "Para obtener un permiso de funcionamiento necesita presentar...",
  "sources": [
    {
      "title": "Manual de Trámites Municipales",
      "filename": "manual_tramites.pdf",
      "page_number": 22,
      "score": 0.93,
      "document_id": "doc_1a2b3c"
    }
  ],
  "relevant_documents_count": 8,
  "generation_used": true,
  "timestamp": "2024-11-10T16:00:00.000Z"
}

POST /api/rag/chat/

An enhanced chat endpoint that integrates the full RAG pipeline with automatic conversation history support. Unlike POST /api/chat/, this endpoint retrieves the 6 most recent messages from the conversation and passes them as context to the LLM, producing richer grounded answers. It auto-creates a new conversation if conversation_id is omitted, and falls back gracefully to the basic chat handler if the RAG modules are unavailable. Returns 201 Created on success. Requires JWT authentication.

For most frontend integrations, POST /api/rag/chat/ is the recommended endpoint. It provides richer metrics tracking (via MetricsTracker) compared to POST /api/chat/, which uses a simpler response generation path.

Request Body

content

string

required

The user’s message. Must be non-empty.

conversation_id

string

Slug or legacy numeric ID of an existing conversation. If omitted, a new conversation is created automatically with the message content as the title (truncated to 50 characters).

use_rag

boolean

default:"true"

Set to false to bypass the RAG pipeline and use the basic keyword-based response generator. Automatically set to false for short greetings (30 characters or fewer) to avoid unnecessary retrieval.

Conversation history is built automatically by the backend — the 6 most recent messages from the conversation are retrieved from the database and passed to the LLM as context. There is no history request body parameter.

Response

conversation_id

string

Slug of the conversation (new or existing).

user_message

object

The saved user message record: {id, content, timestamp}.

assistant_message

object

The saved assistant message record: {id, content, timestamp, rating, rating_issue_tag}.

response

string

The assistant’s answer text (duplicated from assistant_message.content for frontend compatibility).

rag_used

boolean

Whether the RAG pipeline was actually invoked for this response.

sources

array

Source citations for the response. Empty array when rag_used is false.

metrics

object

Performance metrics summary from MetricsTracker: includes latency breakdown, top-k used, and RAGAS scores when available.

Example

curl -X POST https://your-domain.com/api/rag/chat/ \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..." \
  -H "Content-Type: application/json" \
  -d '{
    "content": "¿Cuál es el plazo para renovar una licencia de actividad económica?",
    "conversation_id": "aB3dEfGhIjK"
  }'

GET /api/rag/system-status/

Returns a detailed diagnostic snapshot of the RAG pipeline by calling pipeline.get_system_status(). This is a more verbose version of GET /api/rag/status/ that exposes internal component metadata.

This endpoint has AllowAny permission — a JWT token is not enforced at the Django permission layer. Best practice is to supply the Authorization header in production environments.

Response

rag_available

boolean

Whether the RAG modules are importable.

system_status

object

Detailed component status object returned directly by pipeline.get_system_status(). Contents vary by deployment configuration but typically include vector store connection details, embedding model info, and document counts.

timestamp

string

ISO 8601 timestamp of the status check.

Example

curl -X GET https://your-domain.com/api/rag/system-status/ \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."

{
  "rag_available": true,
  "system_status": {
    "vector_store": {
      "backend": "postgres",
      "total_documents": 1847,
      "is_ready": true
    },
    "embeddings": {
      "model": "text-embedding-3-small",
      "dimension": 1536,
      "is_ready": true
    }
  },
  "timestamp": "2024-11-10T16:30:00.000Z"
}

Overview

Endpoints

RAG System API: Pipeline Control and Direct Querying

GET /api/rag/status/

Response

Example

POST /api/rag/initialize/

Response

Example

POST /api/rag/sync/

Request Body

Response

Example

POST /api/rag/query/

Request Body

Response

Example

POST /api/rag/chat/

Request Body

Response

Example

GET /api/rag/system-status/

Response

Example

Build docs developers (and LLMs) love

Overview

Endpoints

Documentation Index

​GET /api/rag/status/

​Response

​Example

​POST /api/rag/initialize/

​Response

​Example

​POST /api/rag/sync/

​Request Body

​Response

​Example

​POST /api/rag/query/

​Request Body

​Response

​Example

​POST /api/rag/chat/

​Request Body

​Response

​Example

​GET /api/rag/system-status/

​Response

​Example

Build docs developers (and LLMs) love

GET /api/rag/status/

Response

Example

POST /api/rag/initialize/

Response

Example

POST /api/rag/sync/

Request Body

Response

Example

POST /api/rag/query/

Request Body

Response

Example

POST /api/rag/chat/

Request Body

Response

Example

GET /api/rag/system-status/

Response

Example