The AgroIA pipeline is the core analysis engine of the system. It takes a single shapefile or GeoJSON representing a field boundary and produces a complete agronomic report: a scored assessment of crop health backed by six years of Sentinel-2 NDVI data, NASA POWER climate records, and a spatial zonification of the field into A/B/C risk zones. The entire process runs locally and ends with automatic ingestion of results into the RAG knowledge base so the data becomes immediately queryable from the dashboard or Telegram bot.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/sdarionicolas-boop/AgroIA-RAG/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Before running the pipeline, confirm the following are in place:- Google Earth Engine authenticated — run
earthengine authenticateand setGEE_PROJECT_IDinconfig/.env. - Services running — the FastAPI ingestion API must be up so the pipeline can push results to the RAG. Start it with
python start.py --apior verify withpython start.py --check. - Shapefile ready — your
.shpfile (with its companion.dbf,.shx, and.prjfiles in the same directory) or a.geojsonfile containing the field polygon.
Command syntax
| Argument | Required | Default | Description |
|---|---|---|---|
<ruta.shp> | Yes | — | Path to the shapefile or GeoJSON of the field |
[cultivo] | No | maiz | Crop type: maiz, soja, trigo, or girasol |
Example
Step-by-step walkthrough
When you run--pipeline, the launcher injects src/ into the Python path and calls run_full_analysis() in src/pipeline/__init__.py. The following steps execute sequentially:
GEE authentication — init_gee()
GEE_PROJECT_ID set in config/.env. This step initialises the ee client that all subsequent satellite queries depend on. If credentials are missing or expired, the pipeline stops here with an authentication error.Shapefile validation — validar_shapefile()
Climate data — get_nasa_climate_safe()
NDVI extraction — get_gee_ndvi_validado() + fallback window
get_gee_ndvi_ventana() expands the search to a ±2-month window. Years with no usable data are excluded and logged.Score and zonification — calcular_score()
- Vigor (40%) — normalised mean NDVI for the critical month
- Estabilidad (30%) — inverse of the historical NDVI coefficient of variation
- Limpieza (20%) — Isolation Forest outlier penalisation (contamination = 0.2)
- Clima (10%) — accumulated heat hours against crop threshold
PDF report — build_report()
src/outputs/AgroIA_<lote_id>.pdf.Interactive map — generar_mapa_offline()
outputs/Mapa_<lote_id>.html.RAG ingestion — enviar_al_rag()
http://localhost:8000/ingesta. This step embeds the report with nomic-embed-text via Ollama and stores it in pgvector, making the field immediately available for natural language queries.Output files
After a successful run, two files are written:| File | Location | Contents |
|---|---|---|
AgroIA_<lote_id>.pdf | src/outputs/ | Full agronomic report with score, NDVI charts, components, and zone map |
Mapa_<lote_id>.html | outputs/ | Self-contained interactive HTML map (open in any browser) |
lote_id defaults to the shapefile’s basename without extension. If you run the pipeline on mi_lote.shp, the outputs are AgroIA_mi_lote.pdf and Mapa_mi_lote.html.
run_full_analysis() selects the one with the largest area. Use the batch processing guide if you need to analyse every polygon individually.Skip prerequisite checks
If you need to run the pipeline without going through the prerequisite validation (for example, in a scripted environment where you know the services are up):Troubleshooting
GEE authentication error: credentials not found or token expired
GEE authentication error: credentials not found or token expired
earthengine authenticate in your terminal to refresh credentials. Also verify that GEE_PROJECT_ID is set correctly in config/.env and that the project has the Earth Engine API enabled in Google Cloud Console.No valid NDVI data — all years excluded
No valid NDVI data — all years excluded
ok, Ventana mes X, or EXCLUIDO).Shapefile CRS error or UTM reprojection failure
Shapefile CRS error or UTM reprojection failure
validar_shapefile() raises an error, open the file in QGIS or run geopandas.read_file('mi_lote.shp').crs to inspect the coordinate system. Reproject to EPSG:4326 before passing it to the pipeline.RAG ingestion fails with connection error
RAG ingestion fails with connection error
enviar_al_rag() step requires the FastAPI service to be running on port 8000. Start it separately with python start.py --api and confirm http://localhost:8000/health returns 200 before re-running the pipeline.Pipeline completes but PDF is empty or missing charts
Pipeline completes but PDF is empty or missing charts
matplotlib cannot find a display backend in headless environments. Ensure the Agg backend is set by adding import matplotlib; matplotlib.use('Agg') at the top of any custom runner script, or run via python start.py which handles this automatically.