AI Chatbot: RAG-Powered Health Information Assistant

The Chatbot module gives the general population a natural-language interface to the Jurisdicción Sanitaria’s entire knowledge base. Instead of searching menus or reading long documents, a user can type a plain question — in Spanish — and receive a concise, sourced answer drawn directly from official CMS content. No custom model is trained; the system’s intelligence comes from combining PostgreSQL’s pgvector extension for semantic search with an external large-language-model (LLM) API for fluent response generation. This means the chatbot’s knowledge is always current: the moment an editor publishes or updates an article, it becomes available to the chatbot without any manual retraining step.

Architecture: Retrieval Augmented Generation (RAG)

RAG is the pattern that makes it possible to ground an LLM’s output in a specific, controlled knowledge base. The chatbot never generates answers from the model’s training data alone — every response is constructed from content that exists in the CMS at the time of the query.

User Query
    ↓
Embedding Generation (user query → vector)
    ↓
Semantic Search (pgvector cosine similarity)
    ↓
Context Retrieval (top-N relevant content chunks)
    ↓
LLM Prompt Construction (system prompt + context + user query)
    ↓
LLM Response Generation (external LLM API)
    ↓
Cited Response returned to User

Because all knowledge comes from the CMS, the chatbot can only answer questions about topics the Jurisdicción has published content on. If no sufficiently similar content exists, the chatbot returns a graceful “no information available” message rather than hallucinating an answer.

How Embeddings Work

Embeddings are the bridge between human language and the vector similarity search that powers the chatbot. Every time a piece of content is published or updated in the CMS, the following indexing pipeline runs automatically:

Text extraction and chunking

The content body (rich text from the Tiptap editor) is stripped of HTML markup and split into overlapping chunks of approximately 500 tokens. Overlapping ensures that sentences spanning a chunk boundary are not lost.

Embedding generation

Each chunk is sent to the embedding model (e.g. OpenAI text-embedding-3-small). The model returns a 1,536-dimension floating-point vector that encodes the semantic meaning of the text.

Vector storage via pgvector

The vector is stored in the content_embeddings table alongside the source contentId and chunk index. PostgreSQL’s pgvector extension provides the vector column type and the <=> cosine-distance operator used at query time.

Lifecycle management

When a content item is updated, its embeddings are deleted and regenerated from the new body. When a content item is archived or soft-deleted, its embeddings are removed from the search index so the chatbot cannot cite retired content.

ContentEmbedding Schema

model ContentEmbedding {
  id          String   @id @default(uuid())
  contentId   String
  chunkIndex  Int
  chunkText   String
  embedding   Unsupported("vector(1536)")
  createdAt   DateTime @default(now())
  updatedAt   DateTime @updatedAt
}

Semantic Similarity Search

At query time, the user’s question is itself converted to a vector and compared against all stored embeddings using cosine distance. The pgvector <=> operator returns the chunks whose meaning is closest to the question.

-- Semantic similarity search: find the 5 most relevant chunks
SELECT
  content_id,
  chunk_text,
  1 - (embedding <=> $1::vector) AS similarity
FROM content_embeddings
ORDER BY embedding <=> $1::vector
LIMIT 5;

Only chunks whose similarity score exceeds CHATBOT_SIMILARITY_THRESHOLD are included in the LLM prompt context. Chunks below the threshold are discarded, and if no chunks pass the threshold, the chatbot returns its fallback message.

Chatbot API

Ask a Question

POST /chatbot/query
Content-Type: application/json

{
  "question": "¿Cómo puedo prevenir el dengue?",
  "language": "es"
}

The language field currently supports "es" (Spanish). The system prompt instructs the LLM to respond in the requested language. Response:

{
  "answer": "Para prevenir el dengue, elimine los criaderos de agua estancada en su hogar: vacíe cubetas, floreros y llantas que acumulen agua. Use repelente de insectos con DEET y duerma bajo mosquitero. Consulte a su médico ante los primeros síntomas.",
  "sources": [
    {
      "contentId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "title": "Guía de Prevención del Dengue",
      "url": "/content/guia-prevencion-dengue",
      "similarity": 0.94
    },
    {
      "contentId": "f9e8d7c6-b5a4-3210-fedc-ba0987654321",
      "title": "Campaña Patio Limpio 2024",
      "url": "/content/campana-patio-limpio-2024",
      "similarity": 0.87
    }
  ]
}

Every response includes a sources array so users can read the full official articles behind the answer.

Force Re-index All Content

Administrators can trigger a full re-indexing of all published content — for example, after changing the chunking strategy or switching embedding models.

POST /chatbot/reindex
Authorization: Bearer <admin-token>

Re-indexing deletes all existing embeddings and regenerates them from scratch. This process is resource-intensive and will temporarily degrade chatbot quality while it runs. Schedule it during off-peak hours and monitor embedding API costs.

Environment Configuration

# OpenAI (or compatible embedding + chat provider)
OPENAI_API_KEY=sk-...
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_CHAT_MODEL=gpt-4o

# RAG tuning
CHATBOT_MAX_CONTEXT_CHUNKS=5
CHATBOT_SIMILARITY_THRESHOLD=0.75

Variable	Purpose
`OPENAI_API_KEY`	API key for the embedding and chat completion provider
`OPENAI_EMBEDDING_MODEL`	Model used to generate content and query vectors
`OPENAI_CHAT_MODEL`	Model used to generate the final natural-language response
`CHATBOT_MAX_CONTEXT_CHUNKS`	Maximum number of retrieved chunks passed to the LLM prompt (default `5`)
`CHATBOT_SIMILARITY_THRESHOLD`	Minimum cosine similarity score for a chunk to be included in context (0–1)

Increase CHATBOT_SIMILARITY_THRESHOLD to make the chatbot more conservative (fewer, higher-quality citations). Lower it to increase recall at the risk of including loosely relevant context. A value between 0.70 and 0.80 works well for health content in Spanish.

Knowledge Base Maintenance

The chatbot’s knowledge is always derived from published CMS content. Publishing a new article, disease guide, or FAQ entry automatically makes it searchable by the chatbot — no manual retraining or admin action required.

The following events in the CMS trigger automatic embedding updates:

CMS Event	Embedding Action
Content published	Chunks generated and embeddings created
Content body updated	Existing embeddings deleted and regenerated
Content archived	Embeddings deleted from `content_embeddings` table
Content hard-deleted	Embeddings cascade-deleted via foreign key

This event-driven approach means the embedding index is always consistent with the live CMS state without requiring periodic batch jobs.

Safety and Accuracy

Health information carries a higher-than-average responsibility for accuracy. The chatbot incorporates several safeguards:

Context-only answers

The LLM system prompt explicitly instructs the model to answer only from the provided context chunks. If the answer cannot be found in the context, the model must say so — it must not draw on its training data.

Similarity threshold gate

If no retrieved chunk exceeds CHATBOT_SIMILARITY_THRESHOLD, the chatbot returns a standard “no information available” message and directs the user to call the Jurisdicción’s helpline.

Source citations

Every response includes the source content items with their similarity scores. Users can follow the link to read the full official document and verify the answer.

Medical disclaimer

Every response is appended with a disclaimer reminding the user that the chatbot is an information assistant, not a medical professional, and directing them to consult a health provider for personal medical decisions.

The chatbot is an information assistant, not a medical advisor. All responses must include a disclaimer directing users to consult qualified health professionals before making any medical decisions. Never remove or suppress this disclaimer in production.

CMS Overview

All chatbot knowledge originates from CMS content.

Content Types

Understand the content types indexed for semantic search.

Timeline

Timeline events are also indexed and citable by the chatbot.

Core CMS

Features

AI Chatbot: RAG-Powered Health Information Assistant

Architecture: Retrieval Augmented Generation (RAG)

How Embeddings Work

ContentEmbedding Schema

Semantic Similarity Search

Chatbot API

Ask a Question

Force Re-index All Content

Environment Configuration

Knowledge Base Maintenance

Safety and Accuracy

Context-only answers

Similarity threshold gate

Source citations

Medical disclaimer

CMS Overview

Content Types

Timeline

Build docs developers (and LLMs) love

Core CMS

Features

Documentation Index

​Architecture: Retrieval Augmented Generation (RAG)

​How Embeddings Work

​ContentEmbedding Schema

​Semantic Similarity Search

​Chatbot API

​Ask a Question

​Force Re-index All Content

​Environment Configuration

​Knowledge Base Maintenance

​Safety and Accuracy

Context-only answers

Similarity threshold gate

Source citations

Medical disclaimer

​Related Modules

CMS Overview

Content Types

Timeline

Build docs developers (and LLMs) love

Architecture: Retrieval Augmented Generation (RAG)

How Embeddings Work

ContentEmbedding Schema

Semantic Similarity Search

Chatbot API

Ask a Question

Force Re-index All Content

Environment Configuration

Knowledge Base Maintenance

Safety and Accuracy

Related Modules