Get started with AgroIA in five steps

AgroIA ships as a Docker Compose stack — a FastAPI backend on port 8000, a Streamlit dashboard on port 8501, and an optional Telegram bot — plus an Ollama instance that runs on the host machine. The steps below take you from a fresh clone to a verified running system and your first pipeline analysis against a real shapefile. The whole sequence takes about ten minutes, excluding Ollama model download time.

Prerequisites

Before you begin, confirm the following are available on your machine:

Docker and Docker Compose — Docker Desktop ≥ 4.x or Docker Engine with the Compose plugin.
Ollama running locally (ollama serve) with the two required models pulled (see below).
Google Earth Engine credentials — an authenticated earthengine CLI and a registered GEE cloud project.
Python 3.10+ — only required if you plan to run start.py outside Docker.

Ollama must run on the host machine, not inside Docker. The containers reach it via host.docker.internal:11434. On Linux, add --add-host=host.docker.internal:host-gateway to your Docker run command if host.docker.internal is not automatically resolved.

Pull the required Ollama models

ollama pull nomic-embed-text
ollama pull gemma3:4b

nomic-embed-text generates the 768-dimensional embeddings stored in pgvector. gemma3:4b is the generation model used by the RAG engine for natural-language lot queries.

Setup

Clone the repository

git clone https://github.com/tu-usuario/agroia-rag.git
cd agroia-rag

Configure environment variables

Copy the example configuration file and fill in your credentials.

cp config/.env.example config/.env

Open config/.env and set at minimum the following variables:

config/.env

# Google Earth Engine
GEE_PROJECT_ID=your-gee-project-id

# PostgreSQL
DB_HOST=localhost
DB_PORT=5432
DB_NAME=agroia
DB_USER=postgres
DB_PASSWORD=your-password

# Ollama
OLLAMA_URL=http://localhost:11434

# API security
INGESTA_SECRET_KEY=change-me-before-exposing

# Telegram (optional — bot will not start if missing)
TELEGRAM_TOKEN=your-bot-token

Never commit config/.env to version control. It contains GEE credentials and the API ingestion secret key. The file is already listed in .gitignore.

Start PostgreSQL with pgvector

The database must be running before the API container starts. Spin up a pgvector-enabled PostgreSQL instance with Docker:

docker run -d \
  --name postgres-agri \
  -e POSTGRES_PASSWORD=your-password \
  -e POSTGRES_DB=agroia \
  -p 5432:5432 \
  pgvector/pgvector:pg16

Then apply the schema migration (idempotent — safe to run multiple times):

psql -h localhost -U postgres -d agroia -f 01_migrate_schema.sql

This creates the informes_lotes and lote_historial tables with all required indexes and the vector extension.

Start the stack with Docker Compose

docker-compose up -d

This starts two containers defined in docker-compose.yml:

Container	Service	Port
`agroia_api`	FastAPI (ingestion + lotes + RAG)	`8000`
`agroia_ui`	Streamlit dashboard	`8501`

Both containers read config/.env via env_file and connect to host.docker.internal for the database and Ollama.

The Telegram bot is not included in the Compose file. Start it separately with python start.py --bot once the API is running.

Verify all services are healthy

curl http://localhost:8000/health

The /health endpoint returns {"status": "ok"} when the API is connected to PostgreSQL and Ollama is reachable on the configured OLLAMA_URL.

Use the built-in prerequisite checker to diagnose connection issues before starting services: python start.py --check. It verifies Python dependencies, PostgreSQL connectivity, Ollama model availability, and token configuration in one pass.

Running services outside Docker

If you prefer to run all services directly on your machine — useful for pipeline development or debugging — install the Python dependencies and use the unified start.py launcher:

pip install -r requirements.txt

All services
Individual services
Skip prerequisites check

python start.py

Starts the FastAPI server, Streamlit dashboard, and Telegram bot in parallel. Press Ctrl+C to stop all processes cleanly.

python start.py --api      # FastAPI only  → port 8000
python start.py --ui       # Streamlit only → port 8501
python start.py --bot      # Telegram bot (polling)

python start.py --skip-checks

Starts services without running the prerequisite verification. Use only when you have already confirmed all services are running.

Running your first pipeline analysis

The pipeline takes a shapefile and a crop type, runs the full GEE + NASA POWER + Score analysis, and automatically ingests the result into the RAG database.

Authenticate Google Earth Engine

earthengine authenticate

Complete the browser OAuth prompt. Credentials are cached locally. Make sure GEE_PROJECT_ID in config/.env matches your registered GEE cloud project.

Run the pipeline on a shapefile

python start.py --pipeline Poligonizacion/1ER\ CORRIDA/poligonos_definitivos.shp maiz

Valid crop values: maiz, soja, trigo, girasol. If omitted, the default is maiz.The pipeline executes these steps internally:

init_gee() — GEE authentication and project binding.
validar_shapefile() — CRS validation and dynamic UTM projection.
get_nasa_climate_safe() — six years of NASA POWER climate data.
get_gee_ndvi_validado() — Sentinel-2 SR NDVI with window fallback.
calcular_score() — AgroIA Score (0–100) and K-Means A/B/C zoning.
build_report() — PDF written to src/outputs/.
generar_mapa_offline() — interactive HTML map written to outputs/.
enviar_al_rag() — automatic ingestion into pgvector.

Process a batch from SAM GeoJSON output

To process all polygons produced by the SAM delineator in one operation:

# Full batch — 268 polygons, Maize crop
python start.py --batch-geojson "Poligonizacion/1ER CORRIDA/poligonos_definitivos.geojson" maiz

# Test run — first 5 polygons only
python start.py --batch-geojson "Poligonizacion/1ER CORRIDA/poligonos_definitivos.geojson" maiz 5

GEE is initialized once and reused across all polygons. The final output prints a success/failure count.

Explore results in the dashboard

Open http://localhost:8501 to explore the ingested lots. You can browse the lot ranking by AgroIA Score, inspect NDVI time series charts, view the Folium interactive map, and query the RAG agent in natural language — for example: “Which lots had the lowest NDVI stability in 2024?”

Service URLs reference

Service	URL	Notes
FastAPI REST	`http://localhost:8000`	Base URL for all API calls
Interactive API docs	`http://localhost:8000/docs`	Swagger UI (auto-generated)
Health check	`http://localhost:8000/health`	Returns `{"status":"ok"}`
Streamlit dashboard	`http://localhost:8501`	Lot explorer + RAG chat

Next steps

System architecture

Understand how each component connects and how data flows from shapefile input to RAG-powered report.

Pipeline guide

Deep dive into pipeline configuration, cultivo parameters, output formats, and batch processing.

Environment configuration

Full reference for all .env variables, their defaults, and validation rules.

API overview

Complete API reference including authentication, request schemas, and response formats.

Get Started

Core Concepts

Guides

Configuration

Get started with AgroIA in five steps

Prerequisites

Pull the required Ollama models

Setup

Running services outside Docker

Running your first pipeline analysis

Service URLs reference

Next steps

System architecture

Pipeline guide

Environment configuration

API overview

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Configuration

Documentation Index

​Prerequisites

​Pull the required Ollama models

​Setup

​Running services outside Docker

​Running your first pipeline analysis

​Service URLs reference

​Next steps

System architecture

Pipeline guide

Environment configuration

API overview

Build docs developers (and LLMs) love

Prerequisites

Pull the required Ollama models

Setup

Running services outside Docker

Running your first pipeline analysis

Service URLs reference

Next steps