Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/sdarionicolas-boop/AgroIA-RAG/llms.txt

Use this file to discover all available pages before exploring further.

When you have a large GeoJSON produced by the SAM poligonizador — potentially containing hundreds of field polygons from a single zone — running the pipeline individually for each field is impractical. The batch mode reads the GeoJSON once, initialises Google Earth Engine a single time, and then iterates over every polygon in sequence, applying the full AgroIA analysis to each one. Results are automatically pushed to the RAG knowledge base as each polygon completes, so the data is available for querying in the dashboard or bot without any extra steps.

When to use batch vs. single pipeline

Use --pipeline when you have a single shapefile or GeoJSON for one specific field and want detailed console output for that field only.
python start.py --pipeline mi_lote.shp maiz

Command syntax

python start.py --batch-geojson <ruta.geojson> [cultivo] [limit]
ArgumentRequiredDefaultDescription
<ruta.geojson>YesPath to the GeoJSON file from the SAM poligonizador
[cultivo]NomaizDefault crop if a polygon has no cultivo property
[limit]Noall polygonsProcess only the first N polygons (useful for testing)

Real examples

# Process every polygon in the zone TAYPE batch (Maíz)
python start.py --batch-geojson "Poligonizacion/1ER CORRIDA/poligonos_definitivos.geojson" maiz
Always test with a small limit (5–10 polygons) before running a full batch. This lets you verify GEE authentication, NASA POWER connectivity, and RAG ingestion are all working without consuming the full processing time.

How batch processing works internally

The batch runner in src/pipeline/__init__.py::run_batch_from_geojson() follows this sequence:
1

Load and validate the GeoJSON

The file is read with GeoPandas. If the CRS is not EPSG:4326, it is reprojected automatically before iteration begins.
2

Apply the limit (if set)

If you passed a limit argument, the GeoDataFrame is truncated to the first N rows with .head(limit). All subsequent steps operate only on those rows.
3

Initialise GEE once

init_gee() is called a single time before the loop begins. This avoids the authentication overhead on every polygon and significantly reduces total processing time for large batches.
4

Iterate and derive identifiers

For each feature in the GeoJSON, the runner derives two values from the polygon’s properties:
  • lote_id — read from the id property. If absent, falls back to <id_prefix>_<index> (e.g. POLIGONO_001).
  • cultivo — read from the cultivo property. If absent or unrecognised, uses the cultivo_default argument.
MultiPolygon geometries are automatically reduced to the largest component polygon.
5

Run the full analysis per polygon

Each polygon goes through the same steps as the single pipeline: NASA POWER climate data, Sentinel-2 NDVI extraction, AgroIA Score calculation, PDF report generation, HTML map generation, and RAG ingestion. Results for that polygon are logged to the console immediately.
6

Print the batch summary

After all polygons are processed, a summary table is printed showing total processed, successful, and failed counts. Failed polygon IDs are listed for quick follow-up.

GeoJSON property expectations

The batch runner reads specific properties from each feature. Ensure your GeoJSON conforms to the following structure:
PropertyTypeRequiredDescription
idstring or numberRecommendedUnique field identifier. Used as lote_id in the database.
cultivostringOptionalCrop type: maiz, soja, trigo, or girasol. Falls back to cultivo_default.
localidadstringOptionalLocality or zone name. Stored in metadata for context.
If the id property is missing from a polygon, the runner generates an automatic identifier using the index (e.g. POLIGONO_001, POLIGONO_002). These auto-generated IDs are valid but harder to track across runs — it is recommended to include an id in your GeoJSON from the SAM output.

Batch results summary

After the batch completes, the console prints a summary like:
=======================================================
  BATCH COMPLETADO
  Procesados : 268
  Exitosos   : 261
  Con error  : 7
  Fallidos   : POLIGONO_014, POLIGONO_089, POLIGONO_103...
=======================================================
Each result in the internal list has a status field:
StatusMeaning
OKAnalysis completed successfully and ingested into RAG
SIN_DATOS_SATELITALESNo valid NDVI data found for any year
GEOMETRIA_INVALIDAPolygon geometry failed validation (skipped)
ERROR: <message>Unexpected exception during processing
Batch runs against large GeoJSON files (100+ polygons) can take several hours depending on GEE API rate limits and NASA POWER response times. Plan accordingly and avoid interrupting the process once started, as partial runs still ingest completed polygons into the RAG.

Build docs developers (and LLMs) love