Seed Public Health XML Data into SQLite - Salud IA Bot - Salud IA Bot

Salud IA Bot stores all public health data in a local SQLite database (data/salud-ia-bot.db) rather than parsing XML files at runtime. Seeding is the one-time (or periodic) process that reads the raw XML files from the data/ directory, maps them to TypeORM entities, and writes the results into SQLite. Once the database is populated, the application never touches the XML files again — making production start-up fast and memory-efficient.

Seeding Workflow Overview

The seeding pipeline works as follows:

XML source files are placed in the data/ directory.
TypeScript seed scripts in scripts/ use fast-xml-parser or xml2js to parse each file.
Parsed rows are mapped to TypeORM entity classes and bulk-inserted into data/salud-ia-bot.db in chunks of 100 records.
The resulting SQLite file is the only artefact the running application needs.

# One-time seeding order (first-time setup)
npm run import:data       # All main datasets
npm run seed:antioquia    # Antioquia health providers
npm run seed:vaccination  # PAI vaccination coverage

Available Seed Commands

Three npm scripts cover the full dataset:

import:data

Imports all main datasets in parallel: SIVIGILA public health events, mental health (CIE-10), sexual health Q&A, and regional providers for Boyacá, Cali, Antioquia, and Yopal. Runs scripts/import-data.ts via ts-node.

seed:antioquia

Seeds the Antioquia health provider dataset from Prestadores_de_Salud_Departamento_de_Antioquia.xml into the antioquia_provider table. Runs scripts/seed-antioquia.ts via ts-node.

seed:vaccination

Seeds PAI vaccination coverage from three XML files: national departmental coverage, Valle del Cauca coverage, and children’s vaccination data. Runs scripts/seed-vaccination.ts via ts-node.

Dataset Reference

The table below maps each seed command to its XML source files and the database tables it creates or populates:

Script	XML Source Files	Tables Created/Populated
`import:data`	`Eventos_de_Interés_en_Salud_Pública_20260514.xml`	`health_event`
`import:data`	`Salud_Mental.xml`	`mental_health`
`import:data`	`Salud_sexual_-_preguntas.xml`	`sexual_health`
`import:data`	`Centros_de_salud_Yopal._.xml`	`yopal_provider`
`import:data`	`SERVICIOS_OFERTADOS_RED_DE_SALUD_DEL_CENTRO_ESE_POR_SEDE_CALI.xml`	`cali_provider`
`import:data`	`servicios_salud_boyaca.xml`	`boyaca_provider`
`import:data`	`Prestadores_de_Salud_Departamento_de_Antioquia.xml`	`antioquia_provider`
`seed:antioquia`	`Prestadores_de_Salud_Departamento_de_Antioquia.xml`	`antioquia_provider`
`seed:vaccination`	`Coberturas_administrativas_de_vacunación_por_departamento_20260528.xml`	`vaccination`
`seed:vaccination`	`Cobertura_de_Vacunación_PAI_en_el_Valle_del_Cauca.xml`	`vaccination`
`seed:vaccination`	`DATOS_DE_VACUNACIÓN_EN_NIÑOS_Y_NIÑAS.xml`	`vaccination`

XML File Locations

All XML source files must be placed inside the data/ directory at the project root before running any seed script. The scripts resolve file paths relative to process.cwd():

data/
├── salud-ia-bot.db                                              ← generated output
├── Eventos_de_Interés_en_Salud_Pública_20260514.xml
├── Salud_Mental.xml
├── Salud_sexual_-_preguntas.xml
├── Prestadores_de_Salud_Departamento_de_Antioquia.xml
├── Centros_de_salud_Yopal._.xml
├── SERVICIOS_OFERTADOS_RED_DE_SALUD_DEL_CENTRO_ESE_POR_SEDE_CALI.xml
├── servicios_salud_boyaca.xml
├── Coberturas_administrativas_de_vacunación_por_departamento_20260528.xml
├── Cobertura_de_Vacunación_PAI_en_el_Valle_del_Cauca.xml
└── DATOS_DE_VACUNACIÓN_EN_NIÑOS_Y_NIÑAS.xml

If a file is missing when a script runs, the importer logs ⏭️ Archivo no encontrado, saltando. and skips that dataset without failing — allowing partial imports when only some XML files are available.

Recommended Seeding Order

Place all XML files in data/

Copy every XML file listed in the table above into the data/ directory. Ensure filenames are byte-for-byte identical to those listed (accented characters included).

Run import:data

Import all main datasets:

npm run import:data

This script opens a TypeORM DataSource with synchronize: true to create the schema on first run, then imports Boyacá, Antioquia, Cali, Yopal, vaccination, mental health, sexual health, and SIVIGILA health events in parallel. On completion it prints a summary:

📊 RESUMEN DE IMPORTACIÓN
Boyacá:          N registros
Antioquia:       N registros
Cali:            N registros
Yopal:           N registros
Vacunación:      N registros
Salud Mental:    N registros
Salud Sexual:    N registros
Eventos Salud:   N registros
TOTAL:           N registros
⏱️  Tiempo:       X.XX segundos

Run seed:antioquia

Seed the Antioquia provider dataset:

npm run seed:antioquia

This script uses xml2js with explicitArray: false to parse Prestadores_de_Salud_Departamento_de_Antioquia.xml and upserts records into the antioquia_provider table via repo.save(entities, { chunk: 100 }).

Run seed:vaccination

Seed PAI vaccination coverage data:

npm run seed:vaccination

Processes all three vaccination XML files. Each file uses a dedicated row mapper to normalise field names to the shared Vaccination entity (fields: coddepto, departamento, a_o, biol_gico, cobertura_de_vacunaci_n, and up to six indicator columns for children’s data).

Verify by starting the development server

Start the app and test a query against the bot:

npm run start:dev

Send a Telegram message such as "¿Cuántos casos de dengue hay en Cali?" and confirm that SIVIGILA statistics are returned. If the bot responds with live data, the seeding was successful.

How the Seed Scripts Work

The seed scripts are standalone TypeScript files executed by ts-node. Here is the core pattern used by all importers (from scripts/import-data.ts):

const parser = new XMLParser();
const data = parser.parse(xmlContent);
const rows = ensureArray(data?.response?.rows?.row);

const entities = rows.map((row) => {
  const entity = new HealthEvent();
  entity.departamento = row.departamento || null;
  entity.nombre_del_evento = row.nombre_del_evento || null;
  entity.total_de_eventos = Number(row.total_de_eventos) || 0;
  // ... additional field mappings
  return entity;
});

await repo.save(entities, { chunk: 100 });

The ensureArray helper handles both single-record XML responses (where row is an object) and multi-record responses (where row is an array), ensuring consistent mapping in both cases.

Production Notes

After seeding locally, transfer data/salud-ia-bot.db to your production environment. The application does not re-parse XML at startup — services query SQLite directly using TypeORM repositories and better-sqlite3.

# Example: copy the database to a remote server via scp
scp data/salud-ia-bot.db user@your-server:/app/data/salud-ia-bot.db

For Render or Railway, copy the file to the mounted persistent disk before the first deploy (see Deployment).

Seed scripts use ts-node to run TypeScript directly. If you see module resolution errors such as Cannot find module '@bot/...' or Cannot find module '@shared/...', ensure tsconfig.json path aliases (@bot/* -> src/bot/*, @shared/* -> src/shared/*) are correctly configured and that tsconfig-paths is available. Install it with npm install --save-dev tsconfig-paths and invoke the script as ts-node -r tsconfig-paths/register scripts/import-data.ts if aliases are needed at script level.

Get Started

Core Features

Architecture

Operations

Seed Public Health XML Data into SQLite - Salud IA Bot

Seeding Workflow Overview

Available Seed Commands

import:data

seed:antioquia

seed:vaccination

Dataset Reference

XML File Locations

Recommended Seeding Order

How the Seed Scripts Work

Production Notes

Build docs developers (and LLMs) love

Get Started

Core Features

Architecture

Operations

Documentation Index

​Seeding Workflow Overview

​Available Seed Commands

import:data

seed:antioquia

seed:vaccination

​Dataset Reference

​XML File Locations

​Recommended Seeding Order

​How the Seed Scripts Work

​Production Notes

Build docs developers (and LLMs) love

Seeding Workflow Overview

Available Seed Commands

Dataset Reference

XML File Locations

Recommended Seeding Order

How the Seed Scripts Work

Production Notes