Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/sdarionicolas-boop/AgroIA-RAG/llms.txt

Use this file to discover all available pages before exploring further.

The src.pipeline module is the programmatic entry point for all AgroIA field analysis. It coordinates satellite imagery retrieval from Google Earth Engine, climate data from NASA POWER, score calculation, PDF and HTML report generation, and optional ingestion into the RAG vector store — all in a single function call. Import it directly in scripts, notebooks, or any automation layer that needs to trigger analysis without using the CLI.
GEE must be authenticated before calling any pipeline function. Set GEE_PROJECT_ID in your .env file and run earthengine authenticate once on the host machine.

run_full_analysis

Runs the complete AgroIA pipeline on a single shapefile or GeoJSON. If the file contains multiple polygons, only the one with the largest area is analysed.
from src.pipeline import run_full_analysis

result = run_full_analysis(
    shp_path="data/my_field.shp",
    cultivo="maiz",
    years=[2020, 2021, 2022, 2023, 2024, 2025],
    lote_id="CAMPO_SUR_01",
    push_to_rag=True,
)
print(result["score"]["total"])  # e.g. 74

Parameters

shp_path
string
required
Path to the input shapefile (.shp) or GeoJSON file. A leading @ character is stripped automatically. The path is normalised with os.path.normpath.
cultivo
string
default:"maiz"
Crop key. Must be one of the keys in CONFIG: "maiz", "soja", "trigo", or "girasol". Controls the critical NDVI month, heat-stress threshold, and NDVI plausibility range used throughout the analysis.
years
number[]
default:"last 6 calendar years"
List of integer years to include in the historical series, e.g. [2020, 2021, 2022, 2023, 2024, 2025]. When omitted, defaults to the six years ending in the current calendar year.
lote_id
string
Unique identifier for the lot. When omitted, the base name of shp_path (without extension) is used.
push_to_rag
boolean
default:"true"
When True, the analysis payload is sent to the local FastAPI ingestion endpoint (POST /ingesta) after the PDF and map are generated. Set to False to produce reports without touching the database.

Return value

Returns None if no valid satellite data could be retrieved for any of the requested years. Otherwise returns a dict with the following keys:
cultivo
string
The crop key used for the analysis (e.g. "maiz").
hectareas
number
Area of the polygon in hectares, calculated in the dynamic UTM projection of the centroid.
centroide
tuple[float, float]
(latitude, longitude) of the polygon centroid in WGS-84 decimal degrees.
score
object
Score for the most recent processed year.
ndvi_historico
object
Dict mapping year (int) to validated NDVI value (float) for years that passed the plausibility filter.
ndvi_critico_actual
number
NDVI value for the critical month of the most recent processed year.
ndvi_hist_bruto
object
Raw NDVI values (before validation exclusion) keyed by year.
anos_excluidos
number[]
List of years dropped because no valid NDVI could be retrieved, even with the fallback window strategy.
horas_calor_hist
object
Accumulated heat-stress hours per year, keyed by year.
horas_calor_actual
number
Heat-stress hours for the most recent processed year.
cv
number
Coefficient of spatial variation from GEE for the most recent year, used to decide whether to produce an A/B/C zone map.
es_variable
boolean
True when the spatial CV exceeds 0.05 and zone segmentation was applied.
lote_gdf
GeoDataFrame
The validated input geometry as a GeoPandas GeoDataFrame in EPSG:4326.
zonas_gdf
GeoDataFrame | None
K-Means zone segmentation result (zones A, B, C) or None if the lot is spatially homogeneous.
conf
dict
The full CONFIG entry for the chosen crop.
epsg_utm
number
Dynamic UTM EPSG code derived from the centroid longitude.
log
string[]
Processing log messages generated during the pipeline run.

run_batch_from_geojson

Processes every polygon in a GeoJSON file produced by the SAM delineation tool. GEE is initialised only once for the entire batch, making this far more efficient than calling run_full_analysis in a loop.
Use the limit parameter during development to validate the pipeline on a small subset before committing to processing hundreds of polygons.
from src.pipeline import run_batch_from_geojson

summary = run_batch_from_geojson(
    geojson_path="Poligonizacion/1ER CORRIDA/poligonos_definitivos.geojson",
    cultivo_default="maiz",
    push_to_rag=True,
    limit=5,           # process only the first 5 polygons for testing
)

for item in summary:
    print(item["lote_id"], item["status"], item.get("score"))
from src.pipeline import run_batch_from_geojson

# Process the entire first SAM run (268 polygons, maiz)
summary = run_batch_from_geojson(
    geojson_path="Poligonizacion/1ER CORRIDA/poligonos_definitivos.geojson",
    cultivo_default="maiz",
    years=[2020, 2021, 2022, 2023, 2024, 2025],
    push_to_rag=True,
)
ok = [r for r in summary if r["status"] == "OK"]
print(f"{len(ok)}/{len(summary)} polygons processed successfully")

Parameters

geojson_path
string
required
Path to the GeoJSON file. Raises FileNotFoundError if the file does not exist. The CRS is re-projected to EPSG:4326 automatically if needed.
cultivo_default
string
default:"maiz"
Fallback crop when a polygon feature does not carry a cultivo property, or when the value is not a recognised key in CONFIG.
years
number[]
default:"last 6 calendar years"
Years to analyse across all polygons. All features in the batch use the same year list.
push_to_rag
boolean
default:"true"
Ingest each successfully analysed lot into the RAG vector store when True.
limit
number
Process only the first N polygons. Useful for testing pipeline health before a full run. When omitted, all polygons are processed.
id_prefix
string
default:"POLIGONO"
Prefix for the auto-generated lote_id when a feature does not have an id property. The suffix is a zero-padded index: POLIGONO_001, POLIGONO_002, etc.

Return value

A list[dict] with one entry per polygon in the batch (up to limit):
lote_id
string
Lot identifier derived from the feature’s id property or auto-generated from id_prefix.
status
string
"OK" on success. Otherwise one of "GEOMETRIA_INVALIDA", "SIN_DATOS_SATELITALES", or "ERROR: <message>".
score
number | null
Overall AgroIA Score (0–100) for the most recent processed year. null when status is not "OK".
hectareas
number
Area of the polygon in hectares. Present only when status is "OK".
MultiPolygon features are reduced to their largest component before analysis. If a geometry is invalid or not a Polygon, the lot is skipped with status "GEOMETRIA_INVALIDA".

Internal pipeline steps

The following describes what happens inside _analyze_one_polygon, which both public functions delegate to after validating their inputs:
1

Climate data — NASA POWER

get_nasa_climate_safe() fetches daily temperature data for the centroid coordinates and accumulates heat-stress hours for each requested year using the sinusoidal formula parameterised by the crop’s tbase and umbral_calor.
2

Satellite NDVI — GEE Sentinel-2 SR

get_gee_ndvi_validado() retrieves the median NDVI for the critical month from the Sentinel-2 Surface Reflectance collection. If the value is null or below the crop’s ndvi_min, get_gee_ndvi_ventana() retries with a ±2 month window (max_delta=2). Years with no valid fallback are added to anos_excluidos.
3

Score and zoning

calcular_score() computes Vigor, Stability, Cleanliness, and Climate components. calcular_cv_gee() measures spatial heterogeneity; when CV > 0.05, zonificar_lote_gee() segments the polygon into three management zones (A, B, C) via K-Means.
4

Reports

build_report() generates a PDF in src/outputs/AgroIA_<lote_id>.pdf. generar_mapa_offline() generates an interactive HTML map in outputs/Mapa_<lote_id>.html.
5

RAG ingestion

When push_to_rag=True, construir_payload_v2() assembles the structured payload and enviar_al_rag() posts it to POST /ingesta with Bearer token authentication.

Supported crops

The following crop keys are valid for the cultivo parameter. All configuration values come from CONFIG in src/pipeline/agro_math.py.
KeyCritical monthumbral_clima (max heat-stress hours)NDVI range
maizJanuary (1)400.25 – 0.92
sojaFebruary (2)350.25 – 0.90
trigoOctober (10)300.20 – 0.88
girasol is accepted as a crop name by the CLI (start.py --pipeline), but does not have an entry in CONFIG in src/pipeline/agro_math.py. Passing girasol as the crop type will raise a KeyError in the pipeline. Use only maiz, soja, or trigo until a girasol config entry is added.

agro_math module

Score formula, CONFIG dict, and NDVI validation utilities.

Batch processing guide

Step-by-step walkthrough for processing large GeoJSON files.

Build docs developers (and LLMs) love