Pipeline module: run_full_analysis and batch API

The src.pipeline module is the programmatic entry point for all AgroIA field analysis. It coordinates satellite imagery retrieval from Google Earth Engine, climate data from NASA POWER, score calculation, PDF and HTML report generation, and optional ingestion into the RAG vector store — all in a single function call. Import it directly in scripts, notebooks, or any automation layer that needs to trigger analysis without using the CLI.

GEE must be authenticated before calling any pipeline function. Set GEE_PROJECT_ID in your .env file and run earthengine authenticate once on the host machine.

run_full_analysis

Runs the complete AgroIA pipeline on a single shapefile or GeoJSON. If the file contains multiple polygons, only the one with the largest area is analysed.

from src.pipeline import run_full_analysis

result = run_full_analysis(
    shp_path="data/my_field.shp",
    cultivo="maiz",
    years=[2020, 2021, 2022, 2023, 2024, 2025],
    lote_id="CAMPO_SUR_01",
    push_to_rag=True,
)
print(result["score"]["total"])  # e.g. 74

Parameters

shp_path

string

required

Path to the input shapefile (.shp) or GeoJSON file. A leading @ character is stripped automatically. The path is normalised with os.path.normpath.

cultivo

string

default:"maiz"

Crop key. Must be one of the keys in CONFIG: "maiz", "soja", "trigo", or "girasol". Controls the critical NDVI month, heat-stress threshold, and NDVI plausibility range used throughout the analysis.

years

number[]

default:"last 6 calendar years"

List of integer years to include in the historical series, e.g. [2020, 2021, 2022, 2023, 2024, 2025]. When omitted, defaults to the six years ending in the current calendar year.

lote_id

string

Unique identifier for the lot. When omitted, the base name of shp_path (without extension) is used.

push_to_rag

boolean

default:"true"

When True, the analysis payload is sent to the local FastAPI ingestion endpoint (POST /ingesta) after the PDF and map are generated. Set to False to produce reports without touching the database.

Return value

Returns None if no valid satellite data could be retrieved for any of the requested years. Otherwise returns a dict with the following keys:

cultivo

string

The crop key used for the analysis (e.g. "maiz").

hectareas

number

Area of the polygon in hectares, calculated in the dynamic UTM projection of the centroid.

centroide

tuple[float, float]

(latitude, longitude) of the polygon centroid in WGS-84 decimal degrees.

score

object

Score for the most recent processed year.

Show score properties

total

number

Overall AgroIA Score from 0 to 100 (integer).

vigor

number

Vigor component (0–40). NDVI normalised against 0.9.

estabilidad

number

Stability component (0–30). Inverse of the coefficient of variation of the historical NDVI series.

limpieza

number

Cleanliness component (0–20). Penalises outlier years detected by Isolation Forest.

clima

number

Climate component (0–10). Based on accumulated heat-stress hours from NASA POWER.

ndvi_historico

object

Dict mapping year (int) to validated NDVI value (float) for years that passed the plausibility filter.

ndvi_critico_actual

number

NDVI value for the critical month of the most recent processed year.

ndvi_hist_bruto

object

Raw NDVI values (before validation exclusion) keyed by year.

anos_excluidos

number[]

List of years dropped because no valid NDVI could be retrieved, even with the fallback window strategy.

horas_calor_hist

object

Accumulated heat-stress hours per year, keyed by year.

horas_calor_actual

number

Heat-stress hours for the most recent processed year.

number

Coefficient of spatial variation from GEE for the most recent year, used to decide whether to produce an A/B/C zone map.

es_variable

boolean

True when the spatial CV exceeds 0.05 and zone segmentation was applied.

lote_gdf

GeoDataFrame

The validated input geometry as a GeoPandas GeoDataFrame in EPSG:4326.

zonas_gdf

GeoDataFrame | None

K-Means zone segmentation result (zones A, B, C) or None if the lot is spatially homogeneous.

conf

dict

The full CONFIG entry for the chosen crop.

epsg_utm

number

Dynamic UTM EPSG code derived from the centroid longitude.

log

string[]

Processing log messages generated during the pipeline run.

run_batch_from_geojson

Processes every polygon in a GeoJSON file produced by the SAM delineation tool. GEE is initialised only once for the entire batch, making this far more efficient than calling run_full_analysis in a loop.

Use the limit parameter during development to validate the pipeline on a small subset before committing to processing hundreds of polygons.

from src.pipeline import run_batch_from_geojson

summary = run_batch_from_geojson(
    geojson_path="Poligonizacion/1ER CORRIDA/poligonos_definitivos.geojson",
    cultivo_default="maiz",
    push_to_rag=True,
    limit=5,           # process only the first 5 polygons for testing
)

for item in summary:
    print(item["lote_id"], item["status"], item.get("score"))

from src.pipeline import run_batch_from_geojson

# Process the entire first SAM run (268 polygons, maiz)
summary = run_batch_from_geojson(
    geojson_path="Poligonizacion/1ER CORRIDA/poligonos_definitivos.geojson",
    cultivo_default="maiz",
    years=[2020, 2021, 2022, 2023, 2024, 2025],
    push_to_rag=True,
)
ok = [r for r in summary if r["status"] == "OK"]
print(f"{len(ok)}/{len(summary)} polygons processed successfully")

Parameters

geojson_path

string

required

Path to the GeoJSON file. Raises FileNotFoundError if the file does not exist. The CRS is re-projected to EPSG:4326 automatically if needed.

cultivo_default

string

default:"maiz"

Fallback crop when a polygon feature does not carry a cultivo property, or when the value is not a recognised key in CONFIG.

years

number[]

default:"last 6 calendar years"

Years to analyse across all polygons. All features in the batch use the same year list.

push_to_rag

boolean

default:"true"

Ingest each successfully analysed lot into the RAG vector store when True.

limit

number

Process only the first N polygons. Useful for testing pipeline health before a full run. When omitted, all polygons are processed.

id_prefix

string

default:"POLIGONO"

Prefix for the auto-generated lote_id when a feature does not have an id property. The suffix is a zero-padded index: POLIGONO_001, POLIGONO_002, etc.

Return value

A list[dict] with one entry per polygon in the batch (up to limit):

lote_id

string

Lot identifier derived from the feature’s id property or auto-generated from id_prefix.

status

string

"OK" on success. Otherwise one of "GEOMETRIA_INVALIDA", "SIN_DATOS_SATELITALES", or "ERROR: <message>".

score

number | null

Overall AgroIA Score (0–100) for the most recent processed year. null when status is not "OK".

hectareas

number

Area of the polygon in hectares. Present only when status is "OK".

MultiPolygon features are reduced to their largest component before analysis. If a geometry is invalid or not a Polygon, the lot is skipped with status "GEOMETRIA_INVALIDA".

Internal pipeline steps

The following describes what happens inside _analyze_one_polygon, which both public functions delegate to after validating their inputs:

Climate data — NASA POWER

get_nasa_climate_safe() fetches daily temperature data for the centroid coordinates and accumulates heat-stress hours for each requested year using the sinusoidal formula parameterised by the crop’s tbase and umbral_calor.

Satellite NDVI — GEE Sentinel-2 SR

get_gee_ndvi_validado() retrieves the median NDVI for the critical month from the Sentinel-2 Surface Reflectance collection. If the value is null or below the crop’s ndvi_min, get_gee_ndvi_ventana() retries with a ±2 month window (max_delta=2). Years with no valid fallback are added to anos_excluidos.

Score and zoning

calcular_score() computes Vigor, Stability, Cleanliness, and Climate components. calcular_cv_gee() measures spatial heterogeneity; when CV > 0.05, zonificar_lote_gee() segments the polygon into three management zones (A, B, C) via K-Means.

Reports

build_report() generates a PDF in src/outputs/AgroIA_<lote_id>.pdf. generar_mapa_offline() generates an interactive HTML map in outputs/Mapa_<lote_id>.html.

RAG ingestion

When push_to_rag=True, construir_payload_v2() assembles the structured payload and enviar_al_rag() posts it to POST /ingesta with Bearer token authentication.

Supported crops

The following crop keys are valid for the cultivo parameter. All configuration values come from CONFIG in src/pipeline/agro_math.py.

Key	Critical month	`umbral_clima` (max heat-stress hours)	NDVI range
`maiz`	January (1)	40	0.25 – 0.92
`soja`	February (2)	35	0.25 – 0.90
`trigo`	October (10)	30	0.20 – 0.88

girasol is accepted as a crop name by the CLI (start.py --pipeline), but does not have an entry in CONFIG in src/pipeline/agro_math.py. Passing girasol as the crop type will raise a KeyError in the pipeline. Use only maiz, soja, or trigo until a girasol config entry is added.

agro_math module

Score formula, CONFIG dict, and NDVI validation utilities.

Batch processing guide

Step-by-step walkthrough for processing large GeoJSON files.

REST API

Python Modules

Pipeline module: run_full_analysis and batch API

run_full_analysis

Parameters

Return value

run_batch_from_geojson

Parameters

Return value

Internal pipeline steps

Supported crops

agro_math module

Batch processing guide

Build docs developers (and LLMs) love

REST API

Python Modules

Documentation Index

​run_full_analysis

​Parameters

​Return value

​run_batch_from_geojson

​Parameters

​Return value

​Internal pipeline steps

​Supported crops

​Related pages

agro_math module

Batch processing guide

Build docs developers (and LLMs) love

run_full_analysis

Parameters

Return value

run_batch_from_geojson

Parameters

Return value

Internal pipeline steps

Supported crops

Related pages