Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/sdarionicolas-boop/AgroIA-RAG/llms.txt

Use this file to discover all available pages before exploring further.

The AgroIA pipeline is the core analysis engine of the system. It takes a single shapefile or GeoJSON representing a field boundary and produces a complete agronomic report: a scored assessment of crop health backed by six years of Sentinel-2 NDVI data, NASA POWER climate records, and a spatial zonification of the field into A/B/C risk zones. The entire process runs locally and ends with automatic ingestion of results into the RAG knowledge base so the data becomes immediately queryable from the dashboard or Telegram bot.

Prerequisites

Before running the pipeline, confirm the following are in place:
  • Google Earth Engine authenticated — run earthengine authenticate and set GEE_PROJECT_ID in config/.env.
  • Services running — the FastAPI ingestion API must be up so the pipeline can push results to the RAG. Start it with python start.py --api or verify with python start.py --check.
  • Shapefile ready — your .shp file (with its companion .dbf, .shx, and .prj files in the same directory) or a .geojson file containing the field polygon.
Run python start.py --check before your first pipeline execution. It verifies PostgreSQL, Ollama, GEE credentials, and all Python dependencies in one pass.

Command syntax

python start.py --pipeline <ruta.shp> [cultivo]
ArgumentRequiredDefaultDescription
<ruta.shp>YesPath to the shapefile or GeoJSON of the field
[cultivo]NomaizCrop type: maiz, soja, trigo, or girasol

Example

python start.py --pipeline Poligonizacion/1ER\ CORRIDA/poligonos_definitivos.shp maiz

Step-by-step walkthrough

When you run --pipeline, the launcher injects src/ into the Python path and calls run_full_analysis() in src/pipeline/__init__.py. The following steps execute sequentially:
1

GEE authentication — init_gee()

Authenticates with Google Earth Engine using the GEE_PROJECT_ID set in config/.env. This step initialises the ee client that all subsequent satellite queries depend on. If credentials are missing or expired, the pipeline stops here with an authentication error.
2

Shapefile validation — validar_shapefile()

Reads the file with GeoPandas and validates geometry type, CRS, and spatial coherence. It reprojects the polygon to a dynamic UTM zone calculated from the centroid’s longitude, which is required for accurate area calculations in hectares.
3

Climate data — get_nasa_climate_safe()

Fetches daily temperature records from the NASA POWER API for the field’s centroid coordinates, covering the last six years. From these records it computes accumulated heat hours using a sinusoidal formula calibrated per crop type. This feeds the Clima (10%) component of the AgroIA Score.
4

NDVI extraction — get_gee_ndvi_validado() + fallback window

Queries Sentinel-2 SR imagery on Google Earth Engine for the critical phenological month of each crop (e.g. flowering for maize). If cloud cover or data gaps prevent a valid reading for a given year, get_gee_ndvi_ventana() expands the search to a ±2-month window. Years with no usable data are excluded and logged.
5

Score and zonification — calcular_score()

Computes the composite AgroIA Score (0–100) across four weighted components:
  • Vigor (40%) — normalised mean NDVI for the critical month
  • Estabilidad (30%) — inverse of the historical NDVI coefficient of variation
  • Limpieza (20%) — Isolation Forest outlier penalisation (contamination = 0.2)
  • Clima (10%) — accumulated heat hours against crop threshold
If the spatial coefficient of variation exceeds 0.05, K-Means clustering divides the field into Zone A (high potential), Zone B (medium), and Zone C (critical) pixels.
6

PDF report — build_report()

Generates a multi-page PDF report containing the score summary, historical NDVI series, score component breakdown, and zone map. The file is saved to src/outputs/AgroIA_<lote_id>.pdf.
7

Interactive map — generar_mapa_offline()

Produces a self-contained HTML file with a Folium interactive map showing the field boundary and, when zonification is active, coloured A/B/C pixel overlays. Saved to outputs/Mapa_<lote_id>.html.
8

RAG ingestion — enviar_al_rag()

Constructs the v2 payload (ASCII keys only) and POSTs it to the local FastAPI endpoint at http://localhost:8000/ingesta. This step embeds the report with nomic-embed-text via Ollama and stores it in pgvector, making the field immediately available for natural language queries.

Output files

After a successful run, two files are written:
FileLocationContents
AgroIA_<lote_id>.pdfsrc/outputs/Full agronomic report with score, NDVI charts, components, and zone map
Mapa_<lote_id>.htmloutputs/Self-contained interactive HTML map (open in any browser)
The lote_id defaults to the shapefile’s basename without extension. If you run the pipeline on mi_lote.shp, the outputs are AgroIA_mi_lote.pdf and Mapa_mi_lote.html.
If the file contains multiple polygons, run_full_analysis() selects the one with the largest area. Use the batch processing guide if you need to analyse every polygon individually.

Skip prerequisite checks

If you need to run the pipeline without going through the prerequisite validation (for example, in a scripted environment where you know the services are up):
python start.py --skip-checks --pipeline mi_lote.shp soja
--skip-checks bypasses the PostgreSQL, Ollama, and token validation checks. If any service is actually down, the pipeline will still fail, but with a less descriptive error message.

Troubleshooting

Run earthengine authenticate in your terminal to refresh credentials. Also verify that GEE_PROJECT_ID is set correctly in config/.env and that the project has the Earth Engine API enabled in Google Cloud Console.
This usually means the field is covered by persistent cloud or the polygon is in an unsupported region. Check that the shapefile CRS is valid and the geometry is not corrupt. You can inspect which years were excluded in the pipeline’s console output — each year logs its status (ok, Ventana mes X, or EXCLUIDO).
The pipeline requires the shapefile to have a defined CRS. If validar_shapefile() raises an error, open the file in QGIS or run geopandas.read_file('mi_lote.shp').crs to inspect the coordinate system. Reproject to EPSG:4326 before passing it to the pipeline.
The enviar_al_rag() step requires the FastAPI service to be running on port 8000. Start it separately with python start.py --api and confirm http://localhost:8000/health returns 200 before re-running the pipeline.
This can happen when matplotlib cannot find a display backend in headless environments. Ensure the Agg backend is set by adding import matplotlib; matplotlib.use('Agg') at the top of any custom runner script, or run via python start.py which handles this automatically.

Build docs developers (and LLMs) love