Run the AgroIA analysis pipeline on a field

The AgroIA pipeline is the core analysis engine of the system. It takes a single shapefile or GeoJSON representing a field boundary and produces a complete agronomic report: a scored assessment of crop health backed by six years of Sentinel-2 NDVI data, NASA POWER climate records, and a spatial zonification of the field into A/B/C risk zones. The entire process runs locally and ends with automatic ingestion of results into the RAG knowledge base so the data becomes immediately queryable from the dashboard or Telegram bot.

Prerequisites

Before running the pipeline, confirm the following are in place:

Google Earth Engine authenticated — run earthengine authenticate and set GEE_PROJECT_ID in config/.env.
Services running — the FastAPI ingestion API must be up so the pipeline can push results to the RAG. Start it with python start.py --api or verify with python start.py --check.
Shapefile ready — your .shp file (with its companion .dbf, .shx, and .prj files in the same directory) or a .geojson file containing the field polygon.

Run python start.py --check before your first pipeline execution. It verifies PostgreSQL, Ollama, GEE credentials, and all Python dependencies in one pass.

Command syntax

python start.py --pipeline <ruta.shp> [cultivo]

Argument	Required	Default	Description
`<ruta.shp>`	Yes	—	Path to the shapefile or GeoJSON of the field
`[cultivo]`	No	`maiz`	Crop type: `maiz`, `soja`, `trigo`, or `girasol`

Example

python start.py --pipeline Poligonizacion/1ER\ CORRIDA/poligonos_definitivos.shp maiz

Step-by-step walkthrough

When you run --pipeline, the launcher injects src/ into the Python path and calls run_full_analysis() in src/pipeline/__init__.py. The following steps execute sequentially:

GEE authentication — init_gee()

Authenticates with Google Earth Engine using the GEE_PROJECT_ID set in config/.env. This step initialises the ee client that all subsequent satellite queries depend on. If credentials are missing or expired, the pipeline stops here with an authentication error.

Shapefile validation — validar_shapefile()

Reads the file with GeoPandas and validates geometry type, CRS, and spatial coherence. It reprojects the polygon to a dynamic UTM zone calculated from the centroid’s longitude, which is required for accurate area calculations in hectares.

Climate data — get_nasa_climate_safe()

Fetches daily temperature records from the NASA POWER API for the field’s centroid coordinates, covering the last six years. From these records it computes accumulated heat hours using a sinusoidal formula calibrated per crop type. This feeds the Clima (10%) component of the AgroIA Score.

NDVI extraction — get_gee_ndvi_validado() + fallback window

Queries Sentinel-2 SR imagery on Google Earth Engine for the critical phenological month of each crop (e.g. flowering for maize). If cloud cover or data gaps prevent a valid reading for a given year, get_gee_ndvi_ventana() expands the search to a ±2-month window. Years with no usable data are excluded and logged.

Score and zonification — calcular_score()

Computes the composite AgroIA Score (0–100) across four weighted components:

Vigor (40%) — normalised mean NDVI for the critical month
Estabilidad (30%) — inverse of the historical NDVI coefficient of variation
Limpieza (20%) — Isolation Forest outlier penalisation (contamination = 0.2)
Clima (10%) — accumulated heat hours against crop threshold

If the spatial coefficient of variation exceeds 0.05, K-Means clustering divides the field into Zone A (high potential), Zone B (medium), and Zone C (critical) pixels.

PDF report — build_report()

Generates a multi-page PDF report containing the score summary, historical NDVI series, score component breakdown, and zone map. The file is saved to src/outputs/AgroIA_<lote_id>.pdf.

Interactive map — generar_mapa_offline()

Produces a self-contained HTML file with a Folium interactive map showing the field boundary and, when zonification is active, coloured A/B/C pixel overlays. Saved to outputs/Mapa_<lote_id>.html.

RAG ingestion — enviar_al_rag()

Constructs the v2 payload (ASCII keys only) and POSTs it to the local FastAPI endpoint at http://localhost:8000/ingesta. This step embeds the report with nomic-embed-text via Ollama and stores it in pgvector, making the field immediately available for natural language queries.

Output files

After a successful run, two files are written:

File	Location	Contents
`AgroIA_<lote_id>.pdf`	`src/outputs/`	Full agronomic report with score, NDVI charts, components, and zone map
`Mapa_<lote_id>.html`	`outputs/`	Self-contained interactive HTML map (open in any browser)

The lote_id defaults to the shapefile’s basename without extension. If you run the pipeline on mi_lote.shp, the outputs are AgroIA_mi_lote.pdf and Mapa_mi_lote.html.

If the file contains multiple polygons, run_full_analysis() selects the one with the largest area. Use the batch processing guide if you need to analyse every polygon individually.

Skip prerequisite checks

If you need to run the pipeline without going through the prerequisite validation (for example, in a scripted environment where you know the services are up):

python start.py --skip-checks --pipeline mi_lote.shp soja

--skip-checks bypasses the PostgreSQL, Ollama, and token validation checks. If any service is actually down, the pipeline will still fail, but with a less descriptive error message.

Troubleshooting

GEE authentication error: credentials not found or token expired

Run earthengine authenticate in your terminal to refresh credentials. Also verify that GEE_PROJECT_ID is set correctly in config/.env and that the project has the Earth Engine API enabled in Google Cloud Console.

No valid NDVI data — all years excluded

This usually means the field is covered by persistent cloud or the polygon is in an unsupported region. Check that the shapefile CRS is valid and the geometry is not corrupt. You can inspect which years were excluded in the pipeline’s console output — each year logs its status (ok, Ventana mes X, or EXCLUIDO).

Shapefile CRS error or UTM reprojection failure

The pipeline requires the shapefile to have a defined CRS. If validar_shapefile() raises an error, open the file in QGIS or run geopandas.read_file('mi_lote.shp').crs to inspect the coordinate system. Reproject to EPSG:4326 before passing it to the pipeline.

RAG ingestion fails with connection error

The enviar_al_rag() step requires the FastAPI service to be running on port 8000. Start it separately with python start.py --api and confirm http://localhost:8000/health returns 200 before re-running the pipeline.

Pipeline completes but PDF is empty or missing charts

This can happen when matplotlib cannot find a display backend in headless environments. Ensure the Agg backend is set by adding import matplotlib; matplotlib.use('Agg') at the top of any custom runner script, or run via python start.py which handles this automatically.

Get Started

Core Concepts

Guides

Configuration

Run the AgroIA analysis pipeline on a field

Prerequisites

Command syntax

Example

Step-by-step walkthrough

Output files

Skip prerequisite checks

Troubleshooting

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Configuration

Documentation Index

​Prerequisites

​Command syntax

​Example

​Step-by-step walkthrough

​Output files

​Skip prerequisite checks

​Troubleshooting

Build docs developers (and LLMs) love

Prerequisites

Command syntax

Example

Step-by-step walkthrough

Output files

Skip prerequisite checks

Troubleshooting