Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/RubenDarioGuerreroNeira/Ecosistema-IA-Colombia/llms.txt

Use this file to discover all available pages before exploring further.

Salud IA Bot separates data ingestion from data consumption into two clearly distinct phases. XML files from SIVIGILA, the Ministerio de Salud, regional provider registries, and the PAI vaccination programme are parsed once on a developer’s machine and stored in a portable SQLite database. In production the application only ever opens that pre-built database — no XML parsing, no in-memory trees, no startup delay. This design keeps RAM consumption low and delivers sub-3-second responses even on shared-memory hosting tiers such as Render’s free tier.

Two-Phase Approach

1

Migration Phase (one-time, local)

Run the seed and import scripts from the scripts/ directory. Each script reads one or more XML files with fast-xml-parser or xml2js, maps the parsed records to TypeORM entities, and bulk-saves them to data/salud-ia-bot.db in chunks of 100 rows. This phase runs on the developer’s machine before deployment.
npm run seed:antioquia      # Parses Prestadores_de_Salud_Departamento_de_Antioquia.xml
npm run seed:vaccination    # Parses three PAI XML files
npm run import:data         # Imports all remaining datasets
2

Production Phase (read-only)

The NestJS application starts, TypeORM opens the SQLite file with synchronize: false, and all service queries run as standard TypeORM find, findOne, and createQueryBuilder calls against the pre-populated tables. Zero XML is ever loaded into memory at runtime.
The data/salud-ia-bot.db file should be committed or transferred to your deployment environment. On Render and similar platforms, mount a persistent disk at the project root to preserve the database across deploys.

XML Data Sources

Each XML file maps to a dedicated seed script and a corresponding runtime service. The table below lists all source files verified against the data/ directory:
XML FileSourceContentMigration Script + Service
Eventos_de_Interés_en_Salud_Pública_20260514.xmlSIVIGILATransmissible disease events (dengue, zika, malaria, tuberculosis, etc.)scripts/import-data.ts + HealthDataService
Salud_Mental.xmlMinisterio SaludCIE-10 mental health diagnoses and care recordsscripts/import-data.ts + MentalHealthService
Salud_sexual_-_preguntas.xmlInternalSexual and reproductive health Q&Ascripts/import-data.ts + SexualHealthService
Prestadores_de_Salud_Departamento_de_Antioquia.xmlRegionsAntioquia health providersscripts/seed-antioquia.ts + AntioquiaHealthService
Centros_de_salud_Yopal._.xmlRegionsYopal health centres with GPS coordinatesscripts/import-data.ts + YopalHealthService
SERVICIOS_OFERTADOS_RED_DE_SALUD_DEL_CENTRO_ESE_POR_SEDE_CALI.xmlRegionsCali services by sede and complexity levelscripts/import-data.ts + CaliHealthService
servicios_salud_boyaca.xmlRegionsBoyaca provider cataloguescripts/import-data.ts + BoyacaHealthService
Coberturas_administrativas_de_vacunación_por_departamento_20260528.xmlPAIDepartmental vaccination coveragescripts/seed-vaccination.ts + VaccinationService
Cobertura_de_Vacunación_PAI_en_el_Valle_del_Cauca.xmlPAIValle del Cauca PAI coveragescripts/seed-vaccination.ts + VaccinationService
DATOS_DE_VACUNACIÓN_EN_NIÑOS_Y_NIÑAS.xmlPAIChildren’s vaccination datascripts/seed-vaccination.ts + VaccinationService
Calidad_del_Aire_en_Colombia_(Promedio_Anual)_20260528.xmlExternal APIAnnual average air quality indicators by municipalityAirQualityService

TypeORM Configuration

The database module configures TypeORM to use the better-sqlite3 driver. The synchronize: false flag is critical — it ensures the schema is never auto-modified at startup and that the tables seeded by the migration scripts remain intact:
// database.module.ts
TypeOrmModule.forRoot({
  type: 'better-sqlite3',
  database: process.cwd() + '/data/salud-ia-bot.db',
  entities: entities,
  synchronize: false, // schema managed by seed/migration scripts
  logging: false,
});
The entities array is imported from src/entities/index.ts and includes all eight entity classes registered in DataModule:
// data.module.ts — TypeOrmModule.forFeature registration
TypeOrmModule.forFeature([
  BoyacaProvider,
  AntioquiaProvider,
  CaliProvider,
  YopalProvider,
  Vaccination,
  MentalHealth,
  SexualHealth,
  HealthEvent,
])

Seed Script Pattern

All seed and import scripts follow the same three-step pattern: parse the XML with fast-xml-parser (or xml2js for complex nested structures), map each row through a typed mapper function, then bulk-save to SQLite using TypeORM’s chunked save:
import { XMLParser } from 'fast-xml-parser';

const parser = new XMLParser();
const xmlContent = fs.readFileSync(path.join(__dirname, '../../data/source.xml'), 'utf-8');
const data = parser.parse(xmlContent);

const entities = rows.map(mapper); // typed mapper function per dataset
await repo.save(entities, { chunk: 100 });
The chunk: 100 option splits large inserts into batches of 100 rows, preventing SQLite parameter-binding limits from being exceeded on datasets with thousands of records. The full SIVIGILA XML schema looks like this:
<Eventos>
  <Evento>
    <nombre_del_evento>Dengue</nombre_del_evento>
    <total_de_eventos>15420</total_de_eventos>
    <femenino>8200</femenino>
    <masculino>7220</masculino>
    <urbano>9800</urbano>
    <rural>5620</rural>
    <fecha_notificaci_n>2024-01-15</fecha_notificaci_n>
  </Evento>
</Eventos>

Data Models

The eight TypeORM entities cover five conceptual domains:

HealthEvent

Maps SIVIGILA transmissible disease records. Fields include event name, total cases, female/male split, urban/rural split, age groups (infant through elderly), and notification date. Queried by HealthDataService and SaludPublicaService.

MentalHealth (Diagnosis)

Stores CIE-10 mental health diagnosis entries from Salud_Mental.xml. Fields include diagnosis code, diagnosis name, total cases, and demographic breakdowns. Queried by MentalHealthService.

SexualHealth (QA)

A question-and-answer store from Salud_sexual_-_preguntas.xml. Each row holds a question string and a pre-written respuesta text. SexualHealthService runs keyword search across question fields.

Provider entities

Four separate entities — AntioquiaProvider, BoyacaProvider, CaliProvider, YopalProvider — reflect the different schemas of each regional dataset. YopalProvider includes latitud and longitud columns to support the Haversine geosearch.

Vaccination

Stores PAI departmental and municipal coverage records from all three vaccination XML files. Fields include department name, vaccine type, and coverage percentage. VaccinationService exposes getAllDepartament() and per-vaccine queries consumed by MlPredictionService for the composite risk score.

Benefits of the SQLite Approach

Zero parse overhead

No XML is loaded at application startup. TypeORM opens the SQLite file in milliseconds and queries are resolved via indexed table scans rather than full in-memory tree traversal.

Reduced RAM usage

Large XML trees for Antioquia (~thousands of providers) and vaccination data stay on disk. Services such as AntioquiaHealthService and VaccinationService use TypeORM repository queries instead of loading arrays into memory.

Faster response times

Combined with the NestJS CacheModule used in DataModule and BotModule, frequently-queried results are memoized in memory. DatasetBuilderService additionally maintains a 24-hour in-process cache for the data tensors fed to the ML prediction models.
The migration scripts are standalone TypeScript files (not NestJS modules) and must be run with ts-node outside the NestJS application lifecycle. They connect directly to TypeORM via DataSource and terminate after the import is complete. They do not run when npm run start:dev or npm run start:prod is executed.

Build docs developers (and LLMs) love