Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/sdarionicolas-boop/AgroIA-RAG/llms.txt

Use this file to discover all available pages before exploring further.

In Latin America, the absence of a public Land Parcel Identification System (LPIS) forces agronomists and insurers to digitize field boundaries by hand — a process that takes 15 to 30 minutes per lot and is prone to error. AgroIA eliminates that friction by automating the entire agronomic diagnostic cycle: from a single GPS point or shapefile input, the system delineates the field polygon, extracts six years of Sentinel-2 NDVI history from Google Earth Engine, incorporates NASA POWER climate data, computes a multivariable AgroIA Score (0–100), and stores the result in a vector database that an LLM can query in natural language. The project was built for the Hackathon COPERNICUS LAC 2026 and is validated against two real-world datasets: TAYPE Siniestros (313 points) and INTA Balcarce (454 points).

The problem

Manual lot digitization in LatAm creates compounding costs across the AgTech and agricultural insurance sectors:
  • Time: 15–30 minutes per field boundary, per update cycle.
  • Accuracy: Hand-traced polygons introduce area errors that cascade into incorrect NDVI readings and insurance valuations.
  • Scale: No public LPIS means every operator maintains private, inconsistent records.
Without a reliable polygon, satellite analysis has no valid geographic anchor — making automated crop monitoring practically impossible at scale.

The solution

AgroIA automates the full diagnostic lifecycle in under 60 seconds:
  1. Automatic delineation — SAM (Segment Anything Model) converts a GPS point or shapefile into a precise GeoJSON polygon.
  2. Satellite analysis — Google Earth Engine queries Sentinel-2 SR for historical NDVI across the last six years.
  3. Climate enrichment — NASA POWER provides accumulated heat stress data per crop type.
  4. Score computation — The AgroIA Score engine (0–100) weights Vigor, Stability, Cleanliness, and Climate.
  5. RAG ingestion — Results are embedded with nomic-embed-text and stored in PostgreSQL + pgvector for natural-language retrieval via gemma3:4b.

Key features

AgroIA Score

Multivariable score (0–100) combining NDVI vigor, historical stability, anomaly cleanliness via IsolationForest, and accumulated heat stress.

SAM delineation

Automatic polygon delineation from a GPS point using Segment Anything Model and Sentinel-2 imagery, producing GeoJSON output in seconds.

RAG engine

pgvector-powered semantic retrieval with Ollama (gemma3:4b) so agronomists can query the full lot history in natural language.

REST API

FastAPI backend on port 8000 with endpoints for ingestion, lot history, bulk GeoJSON upload, and health checks.

Streamlit dashboard

Interactive web dashboard on port 8501 showing rankings, NDVI maps, score breakdowns, and the RAG chat interface.

Telegram bot

Polling-based Telegram bot that exposes RAG queries and lot lookups to mobile users without opening the dashboard.

Validation metrics

The system was validated against two independent real-world datasets before the hackathon submission.
DatasetPointsHit RateNotes
TAYPE Siniestros31385.6%Maize lots, Buenos Aires province
INTA Balcarce45474.9%Mixed crops, pivot irrigation fields
SAM Score (avg)0.962Polygon quality metric across both runs
Area error (avg)9.8%Versus manual reference polygons
The 75% hit rate on INTA Balcarce reflects pivot-irrigated circular fields, which are geometrically more demanding for SAM. A dedicated pivot-optimized run (340 polygons) is included in Poligonizacion/2DA CORRIDA PIVOTES/.

Data sovereignty

All compute — including the LLM — can run entirely on local infrastructure via Ollama. No crop data or lot coordinates need to leave the operator’s environment. Google Earth Engine is the only external dependency that cannot be self-hosted; it requires a registered GEE project and active credentials.

Next steps

Quickstart

Get the full stack running with Docker Compose in five steps, then run your first pipeline analysis.

System architecture

Understand the end-to-end data flow, component responsibilities, and the database schema.

Build docs developers (and LLMs) love