RAG module: consult the agronomic AI agent

The src/rag/core.py module is the query layer of AgroIA. It connects to PostgreSQL (via psycopg2), generates embeddings with the nomic-embed-text model through Ollama, and produces natural-language answers with gemma3:4b. Five public functions and the BASE_PROMPT constant are exported; import them directly rather than duplicating the logic elsewhere in the codebase.

Both the embedding model (nomic-embed-text) and the generation model (gemma3:4b) must be running in Ollama before calling any function in this module. Verify with ollama list.

BASE_PROMPT

The system prompt passed to the LLM on every call to consultar_agente. Import it to maintain consistency when building custom wrappers.

from src.rag.core import BASE_PROMPT

print(BASE_PROMPT)

The current value is:

Eres un asistente agronómico experto del sistema AgroIA.
Responde basándote estrictamente en el contexto recuperado.
ADAPTA LA EXTENSIÓN: Si la pregunta es simple, responde de forma corta y directa. Si es compleja o pide análisis, sé profundo.
No te limites a repetir números; analiza la relación entre ellos si es relevante.
Si preguntan por el Score, desglosa los componentes (Vigor, Estabilidad, Limpieza, Clima) solo si es necesario para responder la duda.
CITAS: Usa [ID-X] para referirte al informe actual y [Campaña YYYY] en lugar de [HIST-YYYY] para que sea más natural para el usuario.
Si no hay información suficiente, indicá 'Dato no disponible'.

The prompt instructs the LLM to cite the consolidated report as [ID-X] and historical campaigns as [Campaña YYYY], which makes responses easier to trace back to specific database records.

consultar_agente

Full RAG pipeline: retrieves context from the database and generates a natural-language response from the LLM.

from src.rag.core import consultar_agente

response = consultar_agente(
    lote_id="TAYPE_LOTE_001",
    pregunta="¿Cómo ha evolucionado el NDVI en los últimos 3 años?",
    top_k=3,
)
print(response)

Parameters

lote_id

string

required

The unique lot identifier as stored in the informes_lotes table. Use listar_lotes() to enumerate all available IDs.

pregunta

string

required

The question to answer, in free-form text. The question is embedded and used for vector similarity search against informes_lotes.

top_k

number

default:"3"

Controls context depth. The consolidated report always occupies one slot; the remaining top_k - 1 slots are filled with the most recent entries from lote_historial. Increase this value to give the LLM more historical campaigns.

Return value

response

string

The LLM-generated answer as a plain string. If the lot is not found in the database, returns a warning string prefixed with ⚠️. On internal errors, returns a string prefixed with ❌.

Generation with gemma3:4b typically takes 14–71 seconds on CPU. For latency-sensitive applications, consider caching responses or pre-fetching context with fetch_context.

fetch_context

Returns the raw context string that would be passed to the LLM, without performing any generation. Use this to inspect retrieval quality or to build custom prompts.

from src.rag.core import fetch_context

context = fetch_context(
    lote_id="TAYPE_LOTE_001",
    pregunta="¿Cuál es el score de vigor?",
    top_k=4,
)
print(context)

Parameters

lote_id

string

required

Lot identifier to scope the retrieval. Only rows matching this lote_id are considered.

pregunta

string

required

Query used to generate the embedding for vector similarity search against informes_lotes.

top_k

number

default:"3"

Total number of context fragments. One fragment is always the consolidated report; top_k - 1 fragments come from lote_historial, ordered by year descending.

Return value

context

string

A multi-paragraph string ready to embed in a prompt. Each fragment is separated by a blank line. If the lot does not exist, returns the string "No hay datos registrados para el lote '<lote_id>'".

The consolidated report fragment follows this format:

[ID-42] INFORME CONSOLIDADO | Lote: TAYPE_LOTE_001 | Cultivo: maiz | Sup: 48.3 ha | Fecha: 2025-03-15
Score: 74/100 | NDVI: 0.712 | Estrés: 18.4h
<contenido_tecnico from the database>

Each historical fragment follows this format:

[Campaña 2024] Registro histórico del año 2024 | Cultivo: maiz | Válido
  NDVI crítico: 0.712 | Estrés: 18.4h | Score: 74/100
  Vigor: 31.6 | Estabilidad: 22.4 | Limpieza: 15.0 | Clima: 5.2
  Variabilidad espacial: Zonificado (12 pts Zona C)

listar_lotes

Returns all lot identifiers registered in informes_lotes, sorted alphabetically.

from src.rag.core import listar_lotes

lots = listar_lotes()
print(f"Total lots: {len(lots)}")
for lote_id in lots:
    print(lote_id)

Return value

lotes

string[]

A list of unique lote_id strings from informes_lotes, ordered by lote_id ascending. Returns an empty list if the table has no rows.

get_historial_lote_raw

Returns the full temporal series from lote_historial as a list of dicts, ordered by year ascending.

from src.rag.core import get_historial_lote_raw

history = get_historial_lote_raw("TAYPE_LOTE_001")
for campaign in history:
    print(campaign["anio"], campaign["score_total"])

Parameters

lote_id

string

required

Lot identifier. Returns an empty list if no rows exist for this ID.

Return value

A list[dict] where each dict represents one campaign year. All numeric fields default to 0 or 0.0 if the database value is NULL.

anio

number

Campaign year (e.g. 2024).

cultivo

string

Crop key for this campaign. Defaults to "N/D" if null.

ndvi_critico

number

NDVI value at the critical month used to compute the score.

horas_calor

number

Accumulated heat-stress hours from NASA POWER for this year.

score_total

number

Overall AgroIA Score (0–100, integer) for this campaign.

score_vigor

number

Vigor component (0–40).

score_estabilidad

number

Stability component (0–30).

score_limpieza

number

Cleanliness component (0–20).

score_clima

number

Climate component (0–10).

valido_para_score

boolean

True when this year’s NDVI passed validation and was included in score calculation. Defaults to True if null.

zonificacion_activa

boolean

True when spatial zoning (A/B/C) was applied for this campaign.

puntos_zona_c

number

Number of K-Means centroids classified as Zone C (low productivity) in this campaign.

get_datos_lote_raw

Returns the consolidated lot record from informes_lotes as a dict, or None if the lot does not exist.

from src.rag.core import get_datos_lote_raw

data = get_datos_lote_raw("TAYPE_LOTE_001")
if data:
    print(data["score_total"], data["cultivo"], data["superficie_ha"])

Parameters

lote_id

string

required

Lot identifier to look up in informes_lotes.

Return value

None when the lot is not found. Otherwise a dict with the following keys:

number

Primary key of the row in informes_lotes.

fecha

string | date

Analysis date as stored in the database.

ndvi

number

NDVI value stored in ndvi_promedio. Defaults to 0 if null.

gdd

number

Heat-stress hours stored in gdd_acumulados. Defaults to 0 if null.

score_total

number

Overall Score (0–100). Falls back to metadata.score_total when the column is null.

number

Spatial coefficient of variation. Falls back to metadata.cv_espacial when null.

zona_activa

boolean

Whether zone segmentation was active. Falls back to metadata.zonificacion_activa.

puntos_zona_c

number

Number of Zone C points. Falls back to metadata.puntos_zona_c.

cultivo

string

Crop key. Falls back to metadata.cultivo, then "N/D".

superficie_ha

number

Field area in hectares. Falls back to metadata.superficie_ha.

contenido

string

The full contenido_tecnico text from the database, as generated by construir_payload_v2.

score_desglose

object

Breakdown of the score components from metadata.score_desglose.

Show score_desglose properties

vigor

number

Vigor component (0–40).

estabilidad

number

Stability component (0–30).

limpieza

number

Cleanliness component (0–20).

clima

number

Climate component (0–10).

Usage example

from src.rag.core import consultar_agente, listar_lotes

# Discover available lots
lots = listar_lotes()
print("Available lots:", lots[:5])

# Query the agent about a specific lot
answer = consultar_agente(
    lote_id=lots[0],
    pregunta="¿Cuál es la tendencia del Score en los últimos años?",
    top_k=4,
)
print(answer)

RAG engine concepts

Architecture of the retrieval-augmented generation system.

Score AgroIA concepts

Explanation of the four score components and how they are calculated.

REST API

Python Modules

RAG module: consult the agronomic AI agent

BASE_PROMPT

consultar_agente

Parameters

Return value

fetch_context

Parameters

Return value

listar_lotes

Return value

get_historial_lote_raw

Parameters

Return value

get_datos_lote_raw

Parameters

Return value

Usage example

RAG engine concepts

Score AgroIA concepts

Build docs developers (and LLMs) love

REST API

Python Modules

Documentation Index

​BASE_PROMPT

​consultar_agente

​Parameters

​Return value

​fetch_context

​Parameters

​Return value

​listar_lotes

​Return value

​get_historial_lote_raw

​Parameters

​Return value

​get_datos_lote_raw

​Parameters

​Return value

​Usage example

​Related pages

RAG engine concepts

Score AgroIA concepts

Build docs developers (and LLMs) love

BASE_PROMPT

consultar_agente

Parameters

Return value

fetch_context

Parameters

Return value

listar_lotes

Return value

get_historial_lote_raw

Parameters

Return value

get_datos_lote_raw

Parameters

Return value

Usage example

Related pages