Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/sdarionicolas-boop/AgroIA-RAG/llms.txt

Use this file to discover all available pages before exploring further.

The src/rag/core.py module is the query layer of AgroIA. It connects to PostgreSQL (via psycopg2), generates embeddings with the nomic-embed-text model through Ollama, and produces natural-language answers with gemma3:4b. Five public functions and the BASE_PROMPT constant are exported; import them directly rather than duplicating the logic elsewhere in the codebase.
Both the embedding model (nomic-embed-text) and the generation model (gemma3:4b) must be running in Ollama before calling any function in this module. Verify with ollama list.

BASE_PROMPT

The system prompt passed to the LLM on every call to consultar_agente. Import it to maintain consistency when building custom wrappers.
from src.rag.core import BASE_PROMPT

print(BASE_PROMPT)
The current value is:
Eres un asistente agronómico experto del sistema AgroIA.
Responde basándote estrictamente en el contexto recuperado.
ADAPTA LA EXTENSIÓN: Si la pregunta es simple, responde de forma corta y directa. Si es compleja o pide análisis, sé profundo.
No te limites a repetir números; analiza la relación entre ellos si es relevante.
Si preguntan por el Score, desglosa los componentes (Vigor, Estabilidad, Limpieza, Clima) solo si es necesario para responder la duda.
CITAS: Usa [ID-X] para referirte al informe actual y [Campaña YYYY] en lugar de [HIST-YYYY] para que sea más natural para el usuario.
Si no hay información suficiente, indicá 'Dato no disponible'.
The prompt instructs the LLM to cite the consolidated report as [ID-X] and historical campaigns as [Campaña YYYY], which makes responses easier to trace back to specific database records.

consultar_agente

Full RAG pipeline: retrieves context from the database and generates a natural-language response from the LLM.
from src.rag.core import consultar_agente

response = consultar_agente(
    lote_id="TAYPE_LOTE_001",
    pregunta="¿Cómo ha evolucionado el NDVI en los últimos 3 años?",
    top_k=3,
)
print(response)

Parameters

lote_id
string
required
The unique lot identifier as stored in the informes_lotes table. Use listar_lotes() to enumerate all available IDs.
pregunta
string
required
The question to answer, in free-form text. The question is embedded and used for vector similarity search against informes_lotes.
top_k
number
default:"3"
Controls context depth. The consolidated report always occupies one slot; the remaining top_k - 1 slots are filled with the most recent entries from lote_historial. Increase this value to give the LLM more historical campaigns.

Return value

response
string
The LLM-generated answer as a plain string. If the lot is not found in the database, returns a warning string prefixed with ⚠️. On internal errors, returns a string prefixed with .
Generation with gemma3:4b typically takes 14–71 seconds on CPU. For latency-sensitive applications, consider caching responses or pre-fetching context with fetch_context.

fetch_context

Returns the raw context string that would be passed to the LLM, without performing any generation. Use this to inspect retrieval quality or to build custom prompts.
from src.rag.core import fetch_context

context = fetch_context(
    lote_id="TAYPE_LOTE_001",
    pregunta="¿Cuál es el score de vigor?",
    top_k=4,
)
print(context)

Parameters

lote_id
string
required
Lot identifier to scope the retrieval. Only rows matching this lote_id are considered.
pregunta
string
required
Query used to generate the embedding for vector similarity search against informes_lotes.
top_k
number
default:"3"
Total number of context fragments. One fragment is always the consolidated report; top_k - 1 fragments come from lote_historial, ordered by year descending.

Return value

context
string
A multi-paragraph string ready to embed in a prompt. Each fragment is separated by a blank line. If the lot does not exist, returns the string "No hay datos registrados para el lote '<lote_id>'".
The consolidated report fragment follows this format:
[ID-42] INFORME CONSOLIDADO | Lote: TAYPE_LOTE_001 | Cultivo: maiz | Sup: 48.3 ha | Fecha: 2025-03-15
Score: 74/100 | NDVI: 0.712 | Estrés: 18.4h
<contenido_tecnico from the database>
Each historical fragment follows this format:
[Campaña 2024] Registro histórico del año 2024 | Cultivo: maiz | Válido
  NDVI crítico: 0.712 | Estrés: 18.4h | Score: 74/100
  Vigor: 31.6 | Estabilidad: 22.4 | Limpieza: 15.0 | Clima: 5.2
  Variabilidad espacial: Zonificado (12 pts Zona C)

listar_lotes

Returns all lot identifiers registered in informes_lotes, sorted alphabetically.
from src.rag.core import listar_lotes

lots = listar_lotes()
print(f"Total lots: {len(lots)}")
for lote_id in lots:
    print(lote_id)

Return value

lotes
string[]
A list of unique lote_id strings from informes_lotes, ordered by lote_id ascending. Returns an empty list if the table has no rows.

get_historial_lote_raw

Returns the full temporal series from lote_historial as a list of dicts, ordered by year ascending.
from src.rag.core import get_historial_lote_raw

history = get_historial_lote_raw("TAYPE_LOTE_001")
for campaign in history:
    print(campaign["anio"], campaign["score_total"])

Parameters

lote_id
string
required
Lot identifier. Returns an empty list if no rows exist for this ID.

Return value

A list[dict] where each dict represents one campaign year. All numeric fields default to 0 or 0.0 if the database value is NULL.
anio
number
Campaign year (e.g. 2024).
cultivo
string
Crop key for this campaign. Defaults to "N/D" if null.
ndvi_critico
number
NDVI value at the critical month used to compute the score.
horas_calor
number
Accumulated heat-stress hours from NASA POWER for this year.
score_total
number
Overall AgroIA Score (0–100, integer) for this campaign.
score_vigor
number
Vigor component (0–40).
score_estabilidad
number
Stability component (0–30).
score_limpieza
number
Cleanliness component (0–20).
score_clima
number
Climate component (0–10).
valido_para_score
boolean
True when this year’s NDVI passed validation and was included in score calculation. Defaults to True if null.
zonificacion_activa
boolean
True when spatial zoning (A/B/C) was applied for this campaign.
puntos_zona_c
number
Number of K-Means centroids classified as Zone C (low productivity) in this campaign.

get_datos_lote_raw

Returns the consolidated lot record from informes_lotes as a dict, or None if the lot does not exist.
from src.rag.core import get_datos_lote_raw

data = get_datos_lote_raw("TAYPE_LOTE_001")
if data:
    print(data["score_total"], data["cultivo"], data["superficie_ha"])

Parameters

lote_id
string
required
Lot identifier to look up in informes_lotes.

Return value

None when the lot is not found. Otherwise a dict with the following keys:
id
number
Primary key of the row in informes_lotes.
fecha
string | date
Analysis date as stored in the database.
ndvi
number
NDVI value stored in ndvi_promedio. Defaults to 0 if null.
gdd
number
Heat-stress hours stored in gdd_acumulados. Defaults to 0 if null.
score_total
number
Overall Score (0–100). Falls back to metadata.score_total when the column is null.
cv
number
Spatial coefficient of variation. Falls back to metadata.cv_espacial when null.
zona_activa
boolean
Whether zone segmentation was active. Falls back to metadata.zonificacion_activa.
puntos_zona_c
number
Number of Zone C points. Falls back to metadata.puntos_zona_c.
cultivo
string
Crop key. Falls back to metadata.cultivo, then "N/D".
superficie_ha
number
Field area in hectares. Falls back to metadata.superficie_ha.
contenido
string
The full contenido_tecnico text from the database, as generated by construir_payload_v2.
score_desglose
object
Breakdown of the score components from metadata.score_desglose.
meta
object
The full metadata JSONB object parsed as a Python dict.

Usage example

from src.rag.core import consultar_agente, listar_lotes

# Discover available lots
lots = listar_lotes()
print("Available lots:", lots[:5])

# Query the agent about a specific lot
answer = consultar_agente(
    lote_id=lots[0],
    pregunta="¿Cuál es la tendencia del Score en los últimos años?",
    top_k=4,
)
print(answer)

RAG engine concepts

Architecture of the retrieval-augmented generation system.

Score AgroIA concepts

Explanation of the four score components and how they are calculated.

Build docs developers (and LLMs) love