TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/sdarionicolas-boop/AgroIA-RAG/llms.txt
Use this file to discover all available pages before exploring further.
src.pipeline module is the programmatic entry point for all AgroIA field analysis. It coordinates satellite imagery retrieval from Google Earth Engine, climate data from NASA POWER, score calculation, PDF and HTML report generation, and optional ingestion into the RAG vector store — all in a single function call. Import it directly in scripts, notebooks, or any automation layer that needs to trigger analysis without using the CLI.
GEE must be authenticated before calling any pipeline function. Set
GEE_PROJECT_ID in your .env file and run earthengine authenticate once on the host machine.run_full_analysis
Runs the complete AgroIA pipeline on a single shapefile or GeoJSON. If the file contains multiple polygons, only the one with the largest area is analysed.Parameters
Path to the input shapefile (
.shp) or GeoJSON file. A leading @ character is stripped automatically. The path is normalised with os.path.normpath.Crop key. Must be one of the keys in
CONFIG: "maiz", "soja", "trigo", or "girasol". Controls the critical NDVI month, heat-stress threshold, and NDVI plausibility range used throughout the analysis.List of integer years to include in the historical series, e.g.
[2020, 2021, 2022, 2023, 2024, 2025]. When omitted, defaults to the six years ending in the current calendar year.Unique identifier for the lot. When omitted, the base name of
shp_path (without extension) is used.When
True, the analysis payload is sent to the local FastAPI ingestion endpoint (POST /ingesta) after the PDF and map are generated. Set to False to produce reports without touching the database.Return value
ReturnsNone if no valid satellite data could be retrieved for any of the requested years. Otherwise returns a dict with the following keys:
The crop key used for the analysis (e.g.
"maiz").Area of the polygon in hectares, calculated in the dynamic UTM projection of the centroid.
(latitude, longitude) of the polygon centroid in WGS-84 decimal degrees.Score for the most recent processed year.
Dict mapping year (int) to validated NDVI value (float) for years that passed the plausibility filter.
NDVI value for the critical month of the most recent processed year.
Raw NDVI values (before validation exclusion) keyed by year.
List of years dropped because no valid NDVI could be retrieved, even with the fallback window strategy.
Accumulated heat-stress hours per year, keyed by year.
Heat-stress hours for the most recent processed year.
Coefficient of spatial variation from GEE for the most recent year, used to decide whether to produce an A/B/C zone map.
True when the spatial CV exceeds 0.05 and zone segmentation was applied.The validated input geometry as a GeoPandas GeoDataFrame in EPSG:4326.
K-Means zone segmentation result (zones A, B, C) or
None if the lot is spatially homogeneous.The full
CONFIG entry for the chosen crop.Dynamic UTM EPSG code derived from the centroid longitude.
Processing log messages generated during the pipeline run.
run_batch_from_geojson
Processes every polygon in a GeoJSON file produced by the SAM delineation tool. GEE is initialised only once for the entire batch, making this far more efficient than callingrun_full_analysis in a loop.
Parameters
Path to the GeoJSON file. Raises
FileNotFoundError if the file does not exist. The CRS is re-projected to EPSG:4326 automatically if needed.Fallback crop when a polygon feature does not carry a
cultivo property, or when the value is not a recognised key in CONFIG.Years to analyse across all polygons. All features in the batch use the same year list.
Ingest each successfully analysed lot into the RAG vector store when
True.Process only the first
N polygons. Useful for testing pipeline health before a full run. When omitted, all polygons are processed.Prefix for the auto-generated
lote_id when a feature does not have an id property. The suffix is a zero-padded index: POLIGONO_001, POLIGONO_002, etc.Return value
Alist[dict] with one entry per polygon in the batch (up to limit):
Lot identifier derived from the feature’s
id property or auto-generated from id_prefix."OK" on success. Otherwise one of "GEOMETRIA_INVALIDA", "SIN_DATOS_SATELITALES", or "ERROR: <message>".Overall AgroIA Score (0–100) for the most recent processed year.
null when status is not "OK".Area of the polygon in hectares. Present only when status is
"OK".Internal pipeline steps
The following describes what happens inside_analyze_one_polygon, which both public functions delegate to after validating their inputs:
Climate data — NASA POWER
get_nasa_climate_safe() fetches daily temperature data for the centroid coordinates and accumulates heat-stress hours for each requested year using the sinusoidal formula parameterised by the crop’s tbase and umbral_calor.Satellite NDVI — GEE Sentinel-2 SR
get_gee_ndvi_validado() retrieves the median NDVI for the critical month from the Sentinel-2 Surface Reflectance collection. If the value is null or below the crop’s ndvi_min, get_gee_ndvi_ventana() retries with a ±2 month window (max_delta=2). Years with no valid fallback are added to anos_excluidos.Score and zoning
calcular_score() computes Vigor, Stability, Cleanliness, and Climate components. calcular_cv_gee() measures spatial heterogeneity; when CV > 0.05, zonificar_lote_gee() segments the polygon into three management zones (A, B, C) via K-Means.Reports
build_report() generates a PDF in src/outputs/AgroIA_<lote_id>.pdf. generar_mapa_offline() generates an interactive HTML map in outputs/Mapa_<lote_id>.html.Supported crops
The following crop keys are valid for thecultivo parameter. All configuration values come from CONFIG in src/pipeline/agro_math.py.
| Key | Critical month | umbral_clima (max heat-stress hours) | NDVI range |
|---|---|---|---|
maiz | January (1) | 40 | 0.25 – 0.92 |
soja | February (2) | 35 | 0.25 – 0.90 |
trigo | October (10) | 30 | 0.20 – 0.88 |
girasol is accepted as a crop name by the CLI (start.py --pipeline), but does not have an entry in CONFIG in src/pipeline/agro_math.py. Passing girasol as the crop type will raise a KeyError in the pipeline. Use only maiz, soja, or trigo until a girasol config entry is added.Related pages
agro_math module
Score formula, CONFIG dict, and NDVI validation utilities.
Batch processing guide
Step-by-step walkthrough for processing large GeoJSON files.