In Latin America, the absence of a public Land Parcel Identification System (LPIS) forces agronomists and insurers to digitize field boundaries by hand — a process that takes 15 to 30 minutes per lot and is prone to error. AgroIA eliminates that friction by automating the entire agronomic diagnostic cycle: from a single GPS point or shapefile input, the system delineates the field polygon, extracts six years of Sentinel-2 NDVI history from Google Earth Engine, incorporates NASA POWER climate data, computes a multivariable AgroIA Score (0–100), and stores the result in a vector database that an LLM can query in natural language. The project was built for the Hackathon COPERNICUS LAC 2026 and is validated against two real-world datasets: TAYPE Siniestros (313 points) and INTA Balcarce (454 points).Documentation Index
Fetch the complete documentation index at: https://mintlify.com/sdarionicolas-boop/AgroIA-RAG/llms.txt
Use this file to discover all available pages before exploring further.
The problem
Manual lot digitization in LatAm creates compounding costs across the AgTech and agricultural insurance sectors:- Time: 15–30 minutes per field boundary, per update cycle.
- Accuracy: Hand-traced polygons introduce area errors that cascade into incorrect NDVI readings and insurance valuations.
- Scale: No public LPIS means every operator maintains private, inconsistent records.
The solution
AgroIA automates the full diagnostic lifecycle in under 60 seconds:- Automatic delineation — SAM (Segment Anything Model) converts a GPS point or shapefile into a precise GeoJSON polygon.
- Satellite analysis — Google Earth Engine queries Sentinel-2 SR for historical NDVI across the last six years.
- Climate enrichment — NASA POWER provides accumulated heat stress data per crop type.
- Score computation — The AgroIA Score engine (0–100) weights Vigor, Stability, Cleanliness, and Climate.
- RAG ingestion — Results are embedded with
nomic-embed-textand stored in PostgreSQL + pgvector for natural-language retrieval viagemma3:4b.
Key features
AgroIA Score
Multivariable score (0–100) combining NDVI vigor, historical stability, anomaly cleanliness via IsolationForest, and accumulated heat stress.
SAM delineation
Automatic polygon delineation from a GPS point using Segment Anything Model and Sentinel-2 imagery, producing GeoJSON output in seconds.
RAG engine
pgvector-powered semantic retrieval with Ollama (gemma3:4b) so agronomists can query the full lot history in natural language.
REST API
FastAPI backend on port 8000 with endpoints for ingestion, lot history, bulk GeoJSON upload, and health checks.
Streamlit dashboard
Interactive web dashboard on port 8501 showing rankings, NDVI maps, score breakdowns, and the RAG chat interface.
Telegram bot
Polling-based Telegram bot that exposes RAG queries and lot lookups to mobile users without opening the dashboard.
Validation metrics
The system was validated against two independent real-world datasets before the hackathon submission.| Dataset | Points | Hit Rate | Notes |
|---|---|---|---|
| TAYPE Siniestros | 313 | 85.6% | Maize lots, Buenos Aires province |
| INTA Balcarce | 454 | 74.9% | Mixed crops, pivot irrigation fields |
| SAM Score (avg) | — | 0.962 | Polygon quality metric across both runs |
| Area error (avg) | — | 9.8% | Versus manual reference polygons |
The 75% hit rate on INTA Balcarce reflects pivot-irrigated circular fields, which are geometrically more demanding for SAM. A dedicated pivot-optimized run (340 polygons) is included in
Poligonizacion/2DA CORRIDA PIVOTES/.Data sovereignty
All compute — including the LLM — can run entirely on local infrastructure via Ollama. No crop data or lot coordinates need to leave the operator’s environment. Google Earth Engine is the only external dependency that cannot be self-hosted; it requires a registered GEE project and active credentials.Next steps
Quickstart
Get the full stack running with Docker Compose in five steps, then run your first pipeline analysis.
System architecture
Understand the end-to-end data flow, component responsibilities, and the database schema.