Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/sdarionicolas-boop/AgroIA-RAG/llms.txt

Use this file to discover all available pages before exploring further.

AgroIA uses PostgreSQL with the pgvector extension to store agronomic reports alongside their 768-dimensional semantic embeddings, enabling cosine-similarity search at query time. The schema is defined in 01_migrate_schema.sql and is fully idempotent — it can be run against a fresh database for initial setup or against an existing one to apply incremental migrations without data loss.

Running the migration

1

Start the PostgreSQL container

docker run -d \
  --name postgres-agri \
  -e POSTGRES_USER=agri_user \
  -e POSTGRES_PASSWORD=your_password \
  -e POSTGRES_DB=agri_db \
  -p 5432:5432 \
  pgvector/pgvector:pg16
Use the official pgvector/pgvector image so the vector extension is pre-installed.
2

Apply the schema

Linux / macOS
psql -U agri_user -d agri_db -f 01_migrate_schema.sql
Windows (Docker exec)
Get-Content 01_migrate_schema.sql | docker exec -i postgres-agri psql -U agri_user -d agri_db
3

Verify the tables exist

The final SELECT in the migration script prints row counts for both tables:
 tabla            | registros
------------------+-----------
 informes_lotes   |         0
 lote_historial   |         0
The migration is idempotent. All CREATE TABLE, CREATE INDEX, and ALTER TABLE statements use IF NOT EXISTS, and the duplicate-cleanup DELETE is safe to re-run on an empty table. You can apply it repeatedly without side effects.

Table: informes_lotes

One row per lot. Holds the latest consolidated report, the full technical content used by the RAG engine, and the vector embedding for semantic search. Unique constraint: UNIQUE (lote_id) — each lot has exactly one current report. Subsequent ingestions perform an ON CONFLICT (lote_id) DO UPDATE.
ColumnTypeNotes
idSERIALPrimary key
lote_idVARCHAR(100)Business identifier for the lot
fechaDATEReport date
ndvi_promedioDOUBLE PRECISIONAverage NDVI for the critical month
gdd_acumuladosDOUBLE PRECISIONAccumulated growing degree days
score_totalINTEGERAgroIA Score 0–100
cv_espacialDOUBLE PRECISIONSpatial coefficient of variation (NDVI)
zona_activaBOOLEANWhether zone-C delineation is active
puntos_zona_cINTEGERNumber of points classified as zone C
cultivoVARCHAR(50)Crop type (e.g. maiz, soja)
superficie_haDOUBLE PRECISIONLot area in hectares
contenido_tecnicoTEXTFull agronomic report text; source for RAG retrieval
metadataJSONBArbitrary key-value data from the pipeline
embeddingvector(768)Semantic embedding generated by nomic-embed-text
created_atTIMESTAMPRow creation time
updated_atTIMESTAMPLast upsert time
CREATE TABLE IF NOT EXISTS informes_lotes (
    id               SERIAL PRIMARY KEY,
    lote_id          VARCHAR(100) NOT NULL,
    fecha            DATE,
    ndvi_promedio    DOUBLE PRECISION,
    gdd_acumulados   DOUBLE PRECISION,
    score_total      INTEGER,
    cv_espacial      DOUBLE PRECISION,
    zona_activa      BOOLEAN DEFAULT FALSE,
    puntos_zona_c    INTEGER DEFAULT 0,
    cultivo          VARCHAR(50),
    superficie_ha    DOUBLE PRECISION,
    contenido_tecnico TEXT,
    metadata         JSONB,
    embedding        vector(768),
    created_at       TIMESTAMP DEFAULT now(),
    updated_at       TIMESTAMP DEFAULT now()
);

Table: lote_historial

Time series table. One row per lot per campaign year. Populated from the historial_anos array in the ingestion payload. Used by the Streamlit dashboard for historical charts and by the RAG engine for temporal context. Unique constraint: UNIQUE (lote_id, anio) — one record per lot per year. Subsequent ingestions perform an ON CONFLICT (lote_id, anio) DO UPDATE.
ColumnTypeNotes
idSERIALPrimary key
lote_idVARCHAR(100)Foreign reference to informes_lotes.lote_id
anioINTEGERCampaign year (ASCII column name — see note below)
cultivoVARCHAR(50)Crop type for that campaign
ndvi_criticoDOUBLE PRECISIONNDVI at the critical phenological stage
horas_calorDOUBLE PRECISIONAccumulated heat hours (NASA POWER)
score_totalINTEGERAgroIA Score for that year
score_vigorDOUBLE PRECISIONVigor sub-score (40% weight)
score_estabilidadDOUBLE PRECISIONStability sub-score (30% weight)
score_limpiezaDOUBLE PRECISIONCleanliness sub-score (20% weight)
score_climaDOUBLE PRECISIONClimate sub-score (10% weight)
valido_para_scoreBOOLEANWhether the row is used in score calculations
superficie_haDOUBLE PRECISIONField area in hectares for this campaign
cv_espacialDOUBLE PRECISIONSpatial coefficient of variation for the campaign
zonificacion_activaBOOLEANWhether zone delineation was active
puntos_zona_cINTEGERZone-C point count for that year
CREATE TABLE IF NOT EXISTS lote_historial (
    id                  SERIAL PRIMARY KEY,
    lote_id             VARCHAR(100) NOT NULL,
    anio                INTEGER      NOT NULL,
    cultivo             VARCHAR(50),
    ndvi_critico        DOUBLE PRECISION,
    horas_calor         DOUBLE PRECISION,
    score_total         INTEGER,
    score_vigor         DOUBLE PRECISION,
    score_estabilidad   DOUBLE PRECISION,
    score_limpieza      DOUBLE PRECISION,
    score_clima         DOUBLE PRECISION,
    valido_para_score   BOOLEAN DEFAULT TRUE,
    superficie_ha       DOUBLE PRECISION,
    cv_espacial         DOUBLE PRECISION,
    zonificacion_activa BOOLEAN DEFAULT FALSE,
    puntos_zona_c       INTEGER DEFAULT 0,
    created_at          TIMESTAMP DEFAULT now(),
    updated_at          TIMESTAMP DEFAULT now(),
    CONSTRAINT unique_lote_anio UNIQUE (lote_id, anio)
);
The year column is named anio, not año. All JSON payload keys and SQL column names must use ASCII characters only. Using accented characters (e.g. historial_años, año) will cause a key mismatch and silent data loss. Always use historial_anos in payloads and anio in queries.

Indexes

Both tables have indexes on the columns most frequently filtered or ordered by the Streamlit dashboard and the RAG query layer:
IndexTableColumn(s)
idx_lotes_lote_idinformes_loteslote_id
idx_lotes_cultivoinformes_lotescultivo
idx_lotes_scoreinformes_lotesscore_total
idx_historial_lotelote_historiallote_id
idx_historial_aniolote_historialanio
idx_historial_cultivolote_historialcultivo
idx_historial_scorelote_historialscore_total

Advanced topics

If you switch to a model that produces a different vector size, you must drop and recreate the embedding column with the new dimension, then re-ingest all lots to regenerate embeddings. The current schema is fixed at vector(768) for nomic-embed-text.

Build docs developers (and LLMs) love