AI chat on academic dossiers using RAG and LLM

The AI chat panel is embedded at the bottom of every expediente.html page under the heading “Asistente IA del expediente”. It lets you ask free-form questions in Spanish about the dossier you are viewing — completeness, missing documents, academic credentials, validation status, and more — without leaving the page.

How it works

User sends a question

The question is POSTed to POST /expedientes/{cedula}/chat with a JSON body { "pregunta": "..." } and a Authorization: Bearer <token> header. Maximum question length is 500 characters. An AbortController is created so the in-flight request can be cancelled by clicking Parar.

Backend loads dossier context from MongoDB

The router fetches the docente document and up to 8 documents (with full OCR data) from MongoDB for the given cédula. Both queries run against MongoService — no vector database or external retrieval is involved; this is direct structured lookup.

Context is assembled

A plain-text context block is built from the MongoDB data, capped at 6 000 characters total. Each document contributes its campos_extraidos key/value pairs plus up to 500 characters of raw texto_completo OCR. If the total exceeds the limit, OCR excerpts are halved recursively until the context fits.For curriculo_vitae documents, if OCR text is present the backend parses it with regex to extract structured education entries (degree title, specialization, year range, institution) and renders them as a formatted list. The plain raw-text excerpt (Texto:) is only appended for non-CV documents that have no campos_extraidos.

LLM is called

The assembled context is wrapped in a user prompt and sent to the configured LLM provider (OpenRouter or Ollama) via LlmService.chat() with:

Temperature: 0.2 (deterministic, fact-focused)
Max tokens: 300 (short, concise answers)
System prompt: instructs the model to answer only from the provided context, stay under 3 sentences or a short bullet list, respond in Spanish, and avoid legal or administrative opinions.

The Ollama timeout defaults to 120 seconds (configurable via the OLLAMA_TIMEOUT_SECONDS environment variable). OpenRouter uses the OpenAI SDK default timeout.

Response is returned and logged

The response JSON includes respuesta, modelo, and latencia_ms. The UI appends both the question and the answer to the scrollable chat history. The exchange is also written to the auditoria MongoDB collection for audit trail purposes.

Context structure

The following is the exact format of the context block the LLM receives as part of the user prompt:

=== DOCENTE ===
Nombre: <nombres> <apellidos>
Cédula: <cedula>
Email: <email_personal>
Teléfono: <telefono_principal>
Nacionalidad: <nacionalidad>

=== VINCULACIÓN INSTITUCIONAL ===
Departamento: <departamento>
Sede: <sede>
Cargo: <cargo_actual>
Categoría: <categoria>
Dedicación: <dedicacion>
Tipo de contratación: <tipo_contratacion>

=== COMPLETITUD ===
Porcentaje: <n>%
Documentos faltantes: <tipo1>, <tipo2>, ...

=== FORMACIÓN ACADÉMICA ===
- <titulo> — <institucion> (<ano_graduacion>)

=== DOCUMENTOS (N) ===
[1] tipo (validación: estado)
    Campo Extraido: valor
    Otro Campo: valor

[2] tipo_sin_campos (validación: pendiente)
    Texto: extracto OCR (máx. 500 chars, solo si no hay campos_extraidos)

[3] curriculo_vitae (validación: aprobado)
    Formación académica detallada:
      - Licenciatura en Matemáticas, Mención Análisis : (2001–2006) — Universidad X

Using the chat panel

Open a dossier detail page

Navigate to /ui/expediente.html?cedula=XXXXXXXX for any docente. The page must fully load (docente data and document list) before the chat panel is ready.

Locate the chat panel

Scroll to the bottom of the page. The “Asistente IA del expediente” section is below the documents table. A hint line shows example questions: “¿Cuántos documentos tiene?” · “¿Qué le falta para ser apto?” · “¿En qué se especializa según su CV?”

Type and send your question

Type a question in Spanish in the input field (max 500 characters). Press Enter or click Enviar. The input is disabled and a “El asistente está pensando…” indicator appears while the request is in flight.

Cancel an in-flight request (optional)

Click the Parar button (visible while waiting) to call AbortController.abort(). The chat history will show (Pregunta cancelada) as the assistant turn.

Read the response

The assistant’s answer appears in the chat history beneath your question. The response card also shows the LLM latency in seconds (e.g., 2.3s). The chat log auto-scrolls to the latest message.

Example questions

These questions work well given the structured context the LLM receives:

"¿Cuáles son los documentos faltantes en este expediente?"
"¿En qué departamento trabaja el docente?"
"¿Cuál es el porcentaje de completitud?"
"¿Qué títulos académicos tiene según el CV?"
"¿Está aprobado el certificado de notas de pregrado?"
"¿Cuál es la dedicación y categoría del docente?"
"¿Cuántos documentos están pendientes de validación?"

Audit logging

Every chat query is persisted to the auditoria MongoDB collection with tipo_evento: "chat_expediente_consulta". Top-level fields: timestamp, usuario (JWT sub claim), docente_cedula, and resultado ("exitoso"). Performance and model details are nested under a detalles sub-document: pregunta_chars, respuesta_chars, modelo, provider, latencia_ms, and documentos_en_contexto. This provides a full audit trail of all AI interactions for compliance purposes.

Model recommendations

For best response quality with Ollama on CPU, use gemma4:e4b or phi3:mini. These models produce coherent Spanish answers on constrained hardware within the default 120-second timeout. Avoid mistral on CPU — it is significantly slower and may time out on low-end hardware. For OpenRouter, any instruction-tuned model (e.g., mistral-7b-instruct, llama-3-8b-instruct) works well given the low max_tokens limit of 300.

The system prompt (SYSTEM_PROMPT_CHAT in src/prompts/chat_expediente.py) explicitly instructs the model to answer only from the provided context and say “No tengo esa información en el expediente.” if a data point is absent. This prevents hallucination of document fields or personal data not present in MongoDB.

Introducción

Arquitectura

Configuración

Interfaz Web

How it works

Context structure

Using the chat panel

Example questions

Audit logging

Model recommendations

Build docs developers (and LLMs) love

Introducción

Arquitectura

Configuración

Interfaz Web

Documentation Index

​How it works

​Context structure

​Using the chat panel

​Example questions

​Audit logging

​Model recommendations

Build docs developers (and LLMs) love

How it works

Context structure

Using the chat panel

Example questions

Audit logging

Model recommendations