Salud IA Bot’s public health module gives every Telegram user access to Colombia’s official SIVIGILA epidemiological dataset through plain-language questions. Instead of requiring SQL or spreadsheet skills, users simply ask what they want to know and the bot’s NLP engine routes the question to the right analytic method, returning structured answers with percentages, emoji context, and actionable conclusions — all in Spanish.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/RubenDarioGuerreroNeira/Ecosistema-IA-Colombia/llms.txt
Use this file to discover all available pages before exploring further.
The public health module loads SIVIGILA data from a local XML file at startup using
xml2js. No live internet calls are made to SIVIGILA — all data is parsed from the bundled XML file and held in memory for the lifetime of the process.What SIVIGILA is and why it matters
SIVIGILA (Sistema Nacional de Vigilancia en Salud Pública) is Colombia’s mandatory public health surveillance system managed by the Instituto Nacional de Salud (INS). It aggregates mandatory disease event reports from every health provider across the country, covering hundreds of notifiable conditions — from dengue and tuberculosis to occupational accidents and perinatal mortality. The dataset loaded into Salud IA Bot (Eventos_de_Interés_en_Salud_Pública_20260514.xml) contains aggregate national case counts broken down by:
- Urban/rural zone (
urbano/rural) - Six life-cycle age groups: primera infancia (0-4), infancia (5-9), adolescencia (10-14), juventud (15-19), adulto joven (20-49), adulto mayor (50+)
- Sex: femenino / masculino
- Total case count per notifiable event (
total_de_eventos)
How procesarPregunta routes intents
SaludPublicaService.procesarPregunta(texto) is the primary entry point for free-text queries. It normalizes the input, scans a synonym dictionary of approximately 35 entries, and falls back to ambiguous partial-match search when no synonym matches. From there, BotUpdate delegates to specific analytic methods depending on detected intent:
Text normalization
Input is lowercased, diacritics are stripped via Unicode NFD decomposition, punctuation is removed, and whitespace is collapsed. This makes queries like
"DENGUE", "dengüe", and "dengue" resolve to the same key.Synonym resolution
A static map translates common shorthand to the canonical SIVIGILA event name. For example:
dengue -> DENGUE, mordeduras -> AGRESIONES POR ANIMALES POTENCIALMENTE TRANSMISORES DE RABIA, vih -> VIH/SIDA - MORTALIDAD POR SIDA, chikungunya -> CHIKUNGUYA, drogas -> CONSUMO DE SPA.Ambiguous search fallback
If no synonym matches,
buscarEventosAmbigua(nombre) performs partial matching across all event names. The first match is used to retrieve the full event record.Example Telegram queries
Flexible search engine
The search engine avoids naive substring matching. A dedicatednormalizeText() function strips accents, punctuation, and casing before any comparison. When an exact synonym match fails, buscarEventosAmbigua(nombre) performs partial matching across all event names. If that still returns nothing, buscarPorSimilitud(query, threshold=0.6) applies a Levenshtein-distance similarity score, allowing queries like "dgngue" or "chikunguña" to still resolve correctly.
NLP-based demographic queries with CYCLE_KEYWORDS
Demographic queries map natural-language age descriptors to the corresponding SIVIGILA age-group fields. The CYCLE_KEYWORDS constant (defined in constants/keywords.ts) drives this mapping:
| CYCLE_KEYWORDS entry | Life-cycle key | Underlying field(s) |
|---|---|---|
['ninos', 'nino', 'nena'] | niños | primera_infancia, infancia, de_5_a_9 |
['adolescente', 'adolescentes'] | adolescentes | adolescencia |
['jovenes', 'joven'] | jovenes | juventud, adulto_j_ven |
['adultos', 'adulto'] | adultos | adulto_j_ven |
['mayores', 'mayor'] | mayores | adulto_mayor |
eventoPrincipalPorGrupoEtario(grupo) with the matching field name, returning the single event with the highest case count for that age group.
RAG bypass — zero hallucination for structured data
WhenSaludPublicaService returns a valid result, BotUpdate sends that response directly to the user without ever calling the LLaMA 3.1 model. The LLM is invoked only as a fallback for open-ended questions where no structured data path exists. This architecture guarantees that numerical statistics (case counts, percentages, rankings) are always sourced from the actual SIVIGILA dataset and are never generated or embellished by the AI.
BYPASS_MARKERS (defined in constants/keywords.ts) is a list of sentinel strings — such as '--- ANÁLISIS', '--- RANKING', and '--- DISTRIBUCIÓN' — that BotUpdate checks against pre-formatted response strings. When a response begins with one of these markers, the bot sends it directly to the user, bypassing the LLM entirely.
Top events ranking
topEventos(n) and bottomEventos(n) return sorted slices of the event list by total case count.Comparative analysis
compararEventos(A, B) returns both events side by side with the difference in total cases and which event is higher.Gender distribution
proporcionSexoGlobal() and eventosMayorBrechaSexo(n) surface gender imbalances across all or individual events.Urban / rural split
eventoMasRural() and eventoMasUrbano() identify events with the highest proportion of cases in each zone type.