Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/sdarionicolas-boop/AgroIA-RAG/llms.txt

Use this file to discover all available pages before exploring further.

AgroIA ships as a Docker Compose stack — a FastAPI backend on port 8000, a Streamlit dashboard on port 8501, and an optional Telegram bot — plus an Ollama instance that runs on the host machine. The steps below take you from a fresh clone to a verified running system and your first pipeline analysis against a real shapefile. The whole sequence takes about ten minutes, excluding Ollama model download time.

Prerequisites

Before you begin, confirm the following are available on your machine:
  • Docker and Docker Compose — Docker Desktop ≥ 4.x or Docker Engine with the Compose plugin.
  • Ollama running locally (ollama serve) with the two required models pulled (see below).
  • Google Earth Engine credentials — an authenticated earthengine CLI and a registered GEE cloud project.
  • Python 3.10+ — only required if you plan to run start.py outside Docker.
Ollama must run on the host machine, not inside Docker. The containers reach it via host.docker.internal:11434. On Linux, add --add-host=host.docker.internal:host-gateway to your Docker run command if host.docker.internal is not automatically resolved.

Pull the required Ollama models

ollama pull nomic-embed-text
ollama pull gemma3:4b
nomic-embed-text generates the 768-dimensional embeddings stored in pgvector. gemma3:4b is the generation model used by the RAG engine for natural-language lot queries.

Setup

1

Clone the repository

git clone https://github.com/tu-usuario/agroia-rag.git
cd agroia-rag
2

Configure environment variables

Copy the example configuration file and fill in your credentials.
cp config/.env.example config/.env
Open config/.env and set at minimum the following variables:
config/.env
# Google Earth Engine
GEE_PROJECT_ID=your-gee-project-id

# PostgreSQL
DB_HOST=localhost
DB_PORT=5432
DB_NAME=agroia
DB_USER=postgres
DB_PASSWORD=your-password

# Ollama
OLLAMA_URL=http://localhost:11434

# API security
INGESTA_SECRET_KEY=change-me-before-exposing

# Telegram (optional — bot will not start if missing)
TELEGRAM_TOKEN=your-bot-token
Never commit config/.env to version control. It contains GEE credentials and the API ingestion secret key. The file is already listed in .gitignore.
3

Start PostgreSQL with pgvector

The database must be running before the API container starts. Spin up a pgvector-enabled PostgreSQL instance with Docker:
docker run -d \
  --name postgres-agri \
  -e POSTGRES_PASSWORD=your-password \
  -e POSTGRES_DB=agroia \
  -p 5432:5432 \
  pgvector/pgvector:pg16
Then apply the schema migration (idempotent — safe to run multiple times):
psql -h localhost -U postgres -d agroia -f 01_migrate_schema.sql
This creates the informes_lotes and lote_historial tables with all required indexes and the vector extension.
4

Start the stack with Docker Compose

docker-compose up -d
This starts two containers defined in docker-compose.yml:
ContainerServicePort
agroia_apiFastAPI (ingestion + lotes + RAG)8000
agroia_uiStreamlit dashboard8501
Both containers read config/.env via env_file and connect to host.docker.internal for the database and Ollama.
The Telegram bot is not included in the Compose file. Start it separately with python start.py --bot once the API is running.
5

Verify all services are healthy

curl http://localhost:8000/health
The /health endpoint returns {"status": "ok"} when the API is connected to PostgreSQL and Ollama is reachable on the configured OLLAMA_URL.
Use the built-in prerequisite checker to diagnose connection issues before starting services: python start.py --check. It verifies Python dependencies, PostgreSQL connectivity, Ollama model availability, and token configuration in one pass.

Running services outside Docker

If you prefer to run all services directly on your machine — useful for pipeline development or debugging — install the Python dependencies and use the unified start.py launcher:
pip install -r requirements.txt
python start.py
Starts the FastAPI server, Streamlit dashboard, and Telegram bot in parallel. Press Ctrl+C to stop all processes cleanly.

Running your first pipeline analysis

The pipeline takes a shapefile and a crop type, runs the full GEE + NASA POWER + Score analysis, and automatically ingests the result into the RAG database.
1

Authenticate Google Earth Engine

earthengine authenticate
Complete the browser OAuth prompt. Credentials are cached locally. Make sure GEE_PROJECT_ID in config/.env matches your registered GEE cloud project.
2

Run the pipeline on a shapefile

python start.py --pipeline Poligonizacion/1ER\ CORRIDA/poligonos_definitivos.shp maiz
Valid crop values: maiz, soja, trigo, girasol. If omitted, the default is maiz.The pipeline executes these steps internally:
  1. init_gee() — GEE authentication and project binding.
  2. validar_shapefile() — CRS validation and dynamic UTM projection.
  3. get_nasa_climate_safe() — six years of NASA POWER climate data.
  4. get_gee_ndvi_validado() — Sentinel-2 SR NDVI with window fallback.
  5. calcular_score() — AgroIA Score (0–100) and K-Means A/B/C zoning.
  6. build_report() — PDF written to src/outputs/.
  7. generar_mapa_offline() — interactive HTML map written to outputs/.
  8. enviar_al_rag() — automatic ingestion into pgvector.
3

Process a batch from SAM GeoJSON output

To process all polygons produced by the SAM delineator in one operation:
# Full batch — 268 polygons, Maize crop
python start.py --batch-geojson "Poligonizacion/1ER CORRIDA/poligonos_definitivos.geojson" maiz

# Test run — first 5 polygons only
python start.py --batch-geojson "Poligonizacion/1ER CORRIDA/poligonos_definitivos.geojson" maiz 5
GEE is initialized once and reused across all polygons. The final output prints a success/failure count.
4

Explore results in the dashboard

Open http://localhost:8501 to explore the ingested lots. You can browse the lot ranking by AgroIA Score, inspect NDVI time series charts, view the Folium interactive map, and query the RAG agent in natural language — for example: “Which lots had the lowest NDVI stability in 2024?”

Service URLs reference

ServiceURLNotes
FastAPI RESThttp://localhost:8000Base URL for all API calls
Interactive API docshttp://localhost:8000/docsSwagger UI (auto-generated)
Health checkhttp://localhost:8000/healthReturns {"status":"ok"}
Streamlit dashboardhttp://localhost:8501Lot explorer + RAG chat

Next steps

System architecture

Understand how each component connects and how data flows from shapefile input to RAG-powered report.

Pipeline guide

Deep dive into pipeline configuration, cultivo parameters, output formats, and batch processing.

Environment configuration

Full reference for all .env variables, their defaults, and validation rules.

API overview

Complete API reference including authentication, request schemas, and response formats.

Build docs developers (and LLMs) love