When you have a large GeoJSON produced by the SAM poligonizador — potentially containing hundreds of field polygons from a single zone — running the pipeline individually for each field is impractical. The batch mode reads the GeoJSON once, initialises Google Earth Engine a single time, and then iterates over every polygon in sequence, applying the full AgroIA analysis to each one. Results are automatically pushed to the RAG knowledge base as each polygon completes, so the data is available for querying in the dashboard or bot without any extra steps.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/sdarionicolas-boop/AgroIA-RAG/llms.txt
Use this file to discover all available pages before exploring further.
When to use batch vs. single pipeline
- Single pipeline
- Batch GeoJSON
Use
--pipeline when you have a single shapefile or GeoJSON for one specific field and want detailed console output for that field only.Command syntax
| Argument | Required | Default | Description |
|---|---|---|---|
<ruta.geojson> | Yes | — | Path to the GeoJSON file from the SAM poligonizador |
[cultivo] | No | maiz | Default crop if a polygon has no cultivo property |
[limit] | No | all polygons | Process only the first N polygons (useful for testing) |
Real examples
How batch processing works internally
The batch runner insrc/pipeline/__init__.py::run_batch_from_geojson() follows this sequence:
Load and validate the GeoJSON
The file is read with GeoPandas. If the CRS is not EPSG:4326, it is reprojected automatically before iteration begins.
Apply the limit (if set)
If you passed a
limit argument, the GeoDataFrame is truncated to the first N rows with .head(limit). All subsequent steps operate only on those rows.Initialise GEE once
init_gee() is called a single time before the loop begins. This avoids the authentication overhead on every polygon and significantly reduces total processing time for large batches.Iterate and derive identifiers
For each feature in the GeoJSON, the runner derives two values from the polygon’s properties:
lote_id— read from theidproperty. If absent, falls back to<id_prefix>_<index>(e.g.POLIGONO_001).cultivo— read from thecultivoproperty. If absent or unrecognised, uses thecultivo_defaultargument.
Run the full analysis per polygon
Each polygon goes through the same steps as the single pipeline: NASA POWER climate data, Sentinel-2 NDVI extraction, AgroIA Score calculation, PDF report generation, HTML map generation, and RAG ingestion. Results for that polygon are logged to the console immediately.
GeoJSON property expectations
The batch runner reads specific properties from each feature. Ensure your GeoJSON conforms to the following structure:| Property | Type | Required | Description |
|---|---|---|---|
id | string or number | Recommended | Unique field identifier. Used as lote_id in the database. |
cultivo | string | Optional | Crop type: maiz, soja, trigo, or girasol. Falls back to cultivo_default. |
localidad | string | Optional | Locality or zone name. Stored in metadata for context. |
If the
id property is missing from a polygon, the runner generates an automatic identifier using the index (e.g. POLIGONO_001, POLIGONO_002). These auto-generated IDs are valid but harder to track across runs — it is recommended to include an id in your GeoJSON from the SAM output.Batch results summary
After the batch completes, the console prints a summary like:status field:
| Status | Meaning |
|---|---|
OK | Analysis completed successfully and ingested into RAG |
SIN_DATOS_SATELITALES | No valid NDVI data found for any year |
GEOMETRIA_INVALIDA | Polygon geometry failed validation (skipped) |
ERROR: <message> | Unexpected exception during processing |