AgroIA is composed of several loosely coupled subsystems that together automate the full agronomic diagnostic cycle. At the entry point, a GPS coordinate or shapefile is converted into a precise GeoJSON polygon by the SAM delineator. That polygon feeds into the local analysis pipeline, which queries Google Earth Engine for Sentinel-2 NDVI history, calls NASA POWER for climate data, computes the AgroIA Score, generates PDF and HTML reports, and pushes a vector-embedded summary into PostgreSQL. From there, a FastAPI backend exposes the data via REST, a Streamlit dashboard renders it visually, a Telegram bot makes it queryable on mobile, and a RAG engine powered by Ollama enables natural-language lot interrogation.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/sdarionicolas-boop/AgroIA-RAG/llms.txt
Use this file to discover all available pages before exploring further.
End-to-end data flow
The diagram below maps the complete flow from raw field data to consumable outputs, matching the architecture described inAGENTS.md:
Component breakdown
FastAPI backend
Runs on port 8000. Handles ingestion (
/ingesta, /ingesta/geojson), lot queries (/lotes, /lotes/{lote_id}), and health checks. Authentication uses a bearer token from INGESTA_SECRET_KEY.PostgreSQL + pgvector
Stores lot reports (
informes_lotes) and time-series data (lote_historial). The pgvector extension enables cosine similarity search over 768-dimensional embeddings generated by nomic-embed-text.Google Earth Engine
Provides Sentinel-2 SR imagery for NDVI extraction over the last six years. Authenticated via the
earthengine CLI. Requires GEE_PROJECT_ID in .env.SAM delineator
Segment Anything Model converts GPS points into crop field polygons. Two production runs are stored: 268 maize polygons (TAYPE zone) and 340 pivot-irrigated polygons (Tandil/Balcarce).
Ollama (LLM)
Runs
nomic-embed-text for embedding generation and gemma3:4b for RAG response generation. Both models run locally for full data sovereignty.Streamlit dashboard
Runs on port 8501. Displays lot rankings, NDVI time series, Folium HTML maps, score breakdowns, and the RAG chat interface. Sourced from
src/streamlit_app.py.Telegram bot
Polling-based bot defined in
src/bot/telegram_main.py. Exposes RAG queries and lot lookups to mobile users. Started via python start.py --bot.NASA POWER
Provides historical climate data for heat stress calculation. Called via
get_nasa_climate_safe() in src/pipeline/nasa_power.py. No authentication required.Pipeline module internals
The analysis pipeline lives insrc/pipeline/ and is invoked via run_full_analysis(). Each step is a discrete module:
| Module | File | Responsibility |
|---|---|---|
| GEE extractor | src/pipeline/gee_extractor.py | init_gee(), Sentinel-2 SR NDVI queries |
| NASA POWER | src/pipeline/nasa_power.py | get_nasa_climate_safe(), six-year climate history |
| Agro math | src/pipeline/agro_math.py | calcular_score(), get_gee_ndvi_validado(), crop config |
| Reporter | src/pipeline/reporter.py | build_report() PDF, generar_mapa_offline() HTML |
| Comparative reporter | src/pipeline/comparative_reporter.py | Multi-lot ranking PDF |
| Ingestion | src/pipeline/ingesta.py | construir_payload_v2(), enviar_al_rag() |
| Utilities | src/pipeline/utils.py | validar_shapefile() CRS and geometry checks |
src/pipeline_local.py is a legacy entry point retained for compatibility. Use python start.py --pipeline for all new work — it injects src/ into the Python path and calls run_full_analysis() from src/pipeline/__init__.py.AgroIA Score formula
The score aggregates four dimensions into a single 0–100 index. The weights reflect agronomic importance, with crop vigor carrying the highest weight:| Component | Weight | Source | Method |
|---|---|---|---|
| Vigor | 40% | Sentinel-2 SR NDVI | Normalized mean NDVI during the critical crop month |
| Stability | 30% | GEE NDVI history (6 years) | Inverse of the coefficient of variation |
| Cleanliness | 20% | GEE NDVI series | IsolationForest (contamination=0.2) penalizes satellite outliers |
| Climate | 10% | NASA POWER | Accumulated heat hours using sinusoidal formula |
Database schema
informes_lotes
Primary store for aggregated lot reports. One row per lot (UNIQUE(lote_id)).
| Column | Type | Description |
|---|---|---|
lote_id | text | Unique lot identifier (primary key) |
metadata | jsonb | Score breakdown, crop type, area, zone classification |
embedding | vector(768) | nomic-embed-text embedding for semantic search |
created_at | timestamptz | Ingestion timestamp |
lote_historial
Time-series table for annual NDVI and climate records. One row per lot per year (UNIQUE(lote_id, anio)).
| Column | Type | Description |
|---|---|---|
lote_id | text | Foreign reference to informes_lotes |
anio | integer | Year (ASCII field — no diacritics) |
ndvi_promedio | float | Mean NDVI for the year |
stress_termico | float | Accumulated heat stress hours |
Technology stack
| Layer | Technology | Version |
|---|---|---|
| API framework | FastAPI | ≥ 0.100.0 |
| Web server | Uvicorn | ≥ 0.23.0 |
| Database | PostgreSQL + pgvector | pg16 |
| Vector search | pgvector Python client | ≥ 0.2.0 |
| ORM / settings | pydantic-settings | ≥ 2.0.0 |
| Satellite imagery | Google Earth Engine API | ≥ 0.1.340 |
| SAM | segment-anything | ≥ 1.0 |
| CV backend | OpenCV (headless) | ≥ 4.8.0 |
| ML / anomaly detection | scikit-learn | ≥ 1.3.0 |
| Deep learning runtime | PyTorch | ≥ 2.0.0 |
| Geospatial processing | GeoPandas + Shapely | ≥ 0.13.0 / 2.0.0 |
| Map rendering | Folium | ≥ 0.14.0 |
| LLM runtime | Ollama | latest |
| Frontend | Streamlit | ≥ 1.28.0 |
| Bot framework | python-telegram-bot | ≥ 20.0 |
| Container runtime | Docker + Compose | — |
| Python runtime | Python | 3.10 (slim) |
RAG engine
The RAG module lives insrc/rag/core.py. It uses pgvector cosine similarity to retrieve the most relevant lot reports and passes them as context to gemma3:4b via Ollama.
Key exported functions:
consultar_agente(lote_id, pregunta, top_k=3)— returns an LLM response with RAG context for a specific lot.fetch_context(lote_id, pregunta, top_k=3)— retrieval only, no LLM call.listar_lotes()— lists all lots stored in the database.get_historial_lote_raw(lote_id)— returns the raw time series for a lot.get_datos_lote_raw(lote_id)— returns the full report for a lot.
Deployment topology
host.docker.internal. Pipeline runs (python start.py --pipeline) execute on the host directly, outside Docker, to maintain file system access to shapefile inputs and output directories.
Next steps
AgroIA Score concepts
Detailed explanation of each score component, normalization formulas, and crop-specific parameters.
SAM delineation
How SAM converts GPS points to field polygons and what the two production runs cover.
API reference
Full endpoint documentation for ingestion, lot queries, and pipeline module APIs.
Configuration reference
Complete
.env variable reference with defaults and validation rules.