Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/sdarionicolas-boop/AgroIA-RAG/llms.txt

Use this file to discover all available pages before exploring further.

AgroIA is composed of several loosely coupled subsystems that together automate the full agronomic diagnostic cycle. At the entry point, a GPS coordinate or shapefile is converted into a precise GeoJSON polygon by the SAM delineator. That polygon feeds into the local analysis pipeline, which queries Google Earth Engine for Sentinel-2 NDVI history, calls NASA POWER for climate data, computes the AgroIA Score, generates PDF and HTML reports, and pushes a vector-embedded summary into PostgreSQL. From there, a FastAPI backend exposes the data via REST, a Streamlit dashboard renders it visually, a Telegram bot makes it queryable on mobile, and a RAG engine powered by Ollama enables natural-language lot interrogation.

End-to-end data flow

The diagram below maps the complete flow from raw field data to consumable outputs, matching the architecture described in AGENTS.md:
[GPS points / shapefile]          [Bulk GeoJSON file]
         │                                │
         ▼  (SAM delineation)             ▼  (API ingestion)
[SAM Polygon delineator]     [FastAPI — POST /ingesta/geojson]
  Output: GeoJSON polygon                 │
         │                                │
         └──────────────┬─────────────────┘

         [Analysis Pipeline — Motor Local v2.5]
          python start.py --pipeline <ruta.shp> [cultivo]
          ├─ GEE Sentinel-2 SR  (NDVI histórico, 6 años)
          ├─ NASA POWER         (estrés térmico, fórmula sinusoidal)
          ├─ IsolationForest    (limpieza de outliers satelitales)
          ├─ Score AgroIA       (0–100) + Zonificación K-Means A/B/C
          ├─ build_report()     → PDF  in src/outputs/
          ├─ generar_mapa_offline() → HTML map in outputs/
          └─ enviar_al_rag()    → upsert to PostgreSQL + pgvector


              [PostgreSQL + pgvector]
               informes_lotes  (UNIQUE lote_id)
               lote_historial  (UNIQUE lote_id, anio)

          ┌─────────────┼──────────────┐
          ▼             ▼              ▼
    [FastAPI REST]  [Streamlit]  [Telegram bot]
     port 8000      port 8501     polling
          │             │
          └──────────────┤

                  [Ollama — RAG engine]
                   nomic-embed-text (embeddings)
                   gemma3:4b        (generation)

Component breakdown

FastAPI backend

Runs on port 8000. Handles ingestion (/ingesta, /ingesta/geojson), lot queries (/lotes, /lotes/{lote_id}), and health checks. Authentication uses a bearer token from INGESTA_SECRET_KEY.

PostgreSQL + pgvector

Stores lot reports (informes_lotes) and time-series data (lote_historial). The pgvector extension enables cosine similarity search over 768-dimensional embeddings generated by nomic-embed-text.

Google Earth Engine

Provides Sentinel-2 SR imagery for NDVI extraction over the last six years. Authenticated via the earthengine CLI. Requires GEE_PROJECT_ID in .env.

SAM delineator

Segment Anything Model converts GPS points into crop field polygons. Two production runs are stored: 268 maize polygons (TAYPE zone) and 340 pivot-irrigated polygons (Tandil/Balcarce).

Ollama (LLM)

Runs nomic-embed-text for embedding generation and gemma3:4b for RAG response generation. Both models run locally for full data sovereignty.

Streamlit dashboard

Runs on port 8501. Displays lot rankings, NDVI time series, Folium HTML maps, score breakdowns, and the RAG chat interface. Sourced from src/streamlit_app.py.

Telegram bot

Polling-based bot defined in src/bot/telegram_main.py. Exposes RAG queries and lot lookups to mobile users. Started via python start.py --bot.

NASA POWER

Provides historical climate data for heat stress calculation. Called via get_nasa_climate_safe() in src/pipeline/nasa_power.py. No authentication required.

Pipeline module internals

The analysis pipeline lives in src/pipeline/ and is invoked via run_full_analysis(). Each step is a discrete module:
ModuleFileResponsibility
GEE extractorsrc/pipeline/gee_extractor.pyinit_gee(), Sentinel-2 SR NDVI queries
NASA POWERsrc/pipeline/nasa_power.pyget_nasa_climate_safe(), six-year climate history
Agro mathsrc/pipeline/agro_math.pycalcular_score(), get_gee_ndvi_validado(), crop config
Reportersrc/pipeline/reporter.pybuild_report() PDF, generar_mapa_offline() HTML
Comparative reportersrc/pipeline/comparative_reporter.pyMulti-lot ranking PDF
Ingestionsrc/pipeline/ingesta.pyconstruir_payload_v2(), enviar_al_rag()
Utilitiessrc/pipeline/utils.pyvalidar_shapefile() CRS and geometry checks
src/pipeline_local.py is a legacy entry point retained for compatibility. Use python start.py --pipeline for all new work — it injects src/ into the Python path and calls run_full_analysis() from src/pipeline/__init__.py.

AgroIA Score formula

The score aggregates four dimensions into a single 0–100 index. The weights reflect agronomic importance, with crop vigor carrying the highest weight:
Score (0–100) = Vigor (40%) + Estabilidad (30%) + Limpieza (20%) + Clima (10%)
ComponentWeightSourceMethod
Vigor40%Sentinel-2 SR NDVINormalized mean NDVI during the critical crop month
Stability30%GEE NDVI history (6 years)Inverse of the coefficient of variation
Cleanliness20%GEE NDVI seriesIsolationForest (contamination=0.2) penalizes satellite outliers
Climate10%NASA POWERAccumulated heat hours using sinusoidal formula
Lots are also classified into zones A, B, and C using K-Means clustering on the NDVI spatial distribution within the polygon.

Database schema

informes_lotes

Primary store for aggregated lot reports. One row per lot (UNIQUE(lote_id)).
ColumnTypeDescription
lote_idtextUnique lot identifier (primary key)
metadatajsonbScore breakdown, crop type, area, zone classification
embeddingvector(768)nomic-embed-text embedding for semantic search
created_attimestamptzIngestion timestamp

lote_historial

Time-series table for annual NDVI and climate records. One row per lot per year (UNIQUE(lote_id, anio)).
ColumnTypeDescription
lote_idtextForeign reference to informes_lotes
aniointegerYear (ASCII field — no diacritics)
ndvi_promediofloatMean NDVI for the year
stress_termicofloatAccumulated heat stress hours
All JSON payload keys must be ASCII. The anio field uses the ASCII spelling (no tilde) throughout the codebase. Using año in any payload or query will cause a schema mismatch. See the v2 migration notes in AGENTS.md for the full list of renamed keys.

Technology stack

LayerTechnologyVersion
API frameworkFastAPI≥ 0.100.0
Web serverUvicorn≥ 0.23.0
DatabasePostgreSQL + pgvectorpg16
Vector searchpgvector Python client≥ 0.2.0
ORM / settingspydantic-settings≥ 2.0.0
Satellite imageryGoogle Earth Engine API≥ 0.1.340
SAMsegment-anything≥ 1.0
CV backendOpenCV (headless)≥ 4.8.0
ML / anomaly detectionscikit-learn≥ 1.3.0
Deep learning runtimePyTorch≥ 2.0.0
Geospatial processingGeoPandas + Shapely≥ 0.13.0 / 2.0.0
Map renderingFolium≥ 0.14.0
LLM runtimeOllamalatest
FrontendStreamlit≥ 1.28.0
Bot frameworkpython-telegram-bot≥ 20.0
Container runtimeDocker + Compose
Python runtimePython3.10 (slim)

RAG engine

The RAG module lives in src/rag/core.py. It uses pgvector cosine similarity to retrieve the most relevant lot reports and passes them as context to gemma3:4b via Ollama. Key exported functions:
  • consultar_agente(lote_id, pregunta, top_k=3) — returns an LLM response with RAG context for a specific lot.
  • fetch_context(lote_id, pregunta, top_k=3) — retrieval only, no LLM call.
  • listar_lotes() — lists all lots stored in the database.
  • get_historial_lote_raw(lote_id) — returns the raw time series for a lot.
  • get_datos_lote_raw(lote_id) — returns the full report for a lot.
Import functions directly from src/rag/core.py rather than re-implementing retrieval logic. The module is the single source of truth for all RAG operations.

Deployment topology

Host machine
├── Ollama (localhost:11434)
│   ├── nomic-embed-text
│   └── gemma3:4b
├── PostgreSQL + pgvector (localhost:5432)
│   ├── informes_lotes
│   └── lote_historial
└── Docker Compose
    ├── agroia_api  (→ 0.0.0.0:8000)
    └── agroia_ui   (→ 0.0.0.0:8501)
The containers reach the host-side services via host.docker.internal. Pipeline runs (python start.py --pipeline) execute on the host directly, outside Docker, to maintain file system access to shapefile inputs and output directories.

Next steps

AgroIA Score concepts

Detailed explanation of each score component, normalization formulas, and crop-specific parameters.

SAM delineation

How SAM converts GPS points to field polygons and what the two production runs cover.

API reference

Full endpoint documentation for ingestion, lot queries, and pipeline module APIs.

Configuration reference

Complete .env variable reference with defaults and validation rules.

Build docs developers (and LLMs) love