Process multiple fields from a GeoJSON file

When you have a large GeoJSON produced by the SAM poligonizador — potentially containing hundreds of field polygons from a single zone — running the pipeline individually for each field is impractical. The batch mode reads the GeoJSON once, initialises Google Earth Engine a single time, and then iterates over every polygon in sequence, applying the full AgroIA analysis to each one. Results are automatically pushed to the RAG knowledge base as each polygon completes, so the data is available for querying in the dashboard or bot without any extra steps.

When to use batch vs. single pipeline

Single pipeline
Batch GeoJSON

Use --pipeline when you have a single shapefile or GeoJSON for one specific field and want detailed console output for that field only.

python start.py --pipeline mi_lote.shp maiz

Use --batch-geojson when you have the output of the SAM poligonizador: a single GeoJSON file containing many polygons covering a zone, each representing a distinct field.

python start.py --batch-geojson poligonos_definitivos.geojson maiz

Command syntax

python start.py --batch-geojson <ruta.geojson> [cultivo] [limit]

Argument	Required	Default	Description
`<ruta.geojson>`	Yes	—	Path to the GeoJSON file from the SAM poligonizador
`[cultivo]`	No	`maiz`	Default crop if a polygon has no `cultivo` property
`[limit]`	No	all polygons	Process only the first N polygons (useful for testing)

Real examples

# Process every polygon in the zone TAYPE batch (Maíz)
python start.py --batch-geojson "Poligonizacion/1ER CORRIDA/poligonos_definitivos.geojson" maiz

Always test with a small limit (5–10 polygons) before running a full batch. This lets you verify GEE authentication, NASA POWER connectivity, and RAG ingestion are all working without consuming the full processing time.

How batch processing works internally

The batch runner in src/pipeline/__init__.py::run_batch_from_geojson() follows this sequence:

Load and validate the GeoJSON

The file is read with GeoPandas. If the CRS is not EPSG:4326, it is reprojected automatically before iteration begins.

Apply the limit (if set)

If you passed a limit argument, the GeoDataFrame is truncated to the first N rows with .head(limit). All subsequent steps operate only on those rows.

Initialise GEE once

init_gee() is called a single time before the loop begins. This avoids the authentication overhead on every polygon and significantly reduces total processing time for large batches.

Iterate and derive identifiers

For each feature in the GeoJSON, the runner derives two values from the polygon’s properties:

lote_id — read from the id property. If absent, falls back to <id_prefix>_<index> (e.g. POLIGONO_001).
cultivo — read from the cultivo property. If absent or unrecognised, uses the cultivo_default argument.

MultiPolygon geometries are automatically reduced to the largest component polygon.

Run the full analysis per polygon

Each polygon goes through the same steps as the single pipeline: NASA POWER climate data, Sentinel-2 NDVI extraction, AgroIA Score calculation, PDF report generation, HTML map generation, and RAG ingestion. Results for that polygon are logged to the console immediately.

Print the batch summary

After all polygons are processed, a summary table is printed showing total processed, successful, and failed counts. Failed polygon IDs are listed for quick follow-up.

GeoJSON property expectations

The batch runner reads specific properties from each feature. Ensure your GeoJSON conforms to the following structure:

Property	Type	Required	Description
`id`	string or number	Recommended	Unique field identifier. Used as `lote_id` in the database.
`cultivo`	string	Optional	Crop type: `maiz`, `soja`, `trigo`, or `girasol`. Falls back to `cultivo_default`.
`localidad`	string	Optional	Locality or zone name. Stored in metadata for context.

If the id property is missing from a polygon, the runner generates an automatic identifier using the index (e.g. POLIGONO_001, POLIGONO_002). These auto-generated IDs are valid but harder to track across runs — it is recommended to include an id in your GeoJSON from the SAM output.

Batch results summary

After the batch completes, the console prints a summary like:

=======================================================
  BATCH COMPLETADO
  Procesados : 268
  Exitosos   : 261
  Con error  : 7
  Fallidos   : POLIGONO_014, POLIGONO_089, POLIGONO_103...
=======================================================

Each result in the internal list has a status field:

Status	Meaning
`OK`	Analysis completed successfully and ingested into RAG
`SIN_DATOS_SATELITALES`	No valid NDVI data found for any year
`GEOMETRIA_INVALIDA`	Polygon geometry failed validation (skipped)
`ERROR: <message>`	Unexpected exception during processing

Batch runs against large GeoJSON files (100+ polygons) can take several hours depending on GEE API rate limits and NASA POWER response times. Plan accordingly and avoid interrupting the process once started, as partial runs still ingest completed polygons into the RAG.

Get Started

Core Concepts

Guides

Configuration

Process multiple fields from a GeoJSON file

When to use batch vs. single pipeline

Command syntax

Real examples

How batch processing works internally

GeoJSON property expectations

Batch results summary

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Configuration

Documentation Index

​When to use batch vs. single pipeline

​Command syntax

​Real examples

​How batch processing works internally

​GeoJSON property expectations

​Batch results summary

Build docs developers (and LLMs) love

When to use batch vs. single pipeline

Command syntax

Real examples

How batch processing works internally

GeoJSON property expectations

Batch results summary