Documentation Index Fetch the complete documentation index at: https://mintlify.com/terrafloww/rasteret/llms.txt
Use this file to discover all available pages before exploring further.
Quickstart
This guide walks you through the complete Rasteret workflow: building a Collection from STAC, filtering scenes, and fetching pixels in multiple formats.
Time to complete: ~5 minutes
Prerequisites: Python 3.12+, rasteret installed
Overview
You’ll learn how to:
Build a Collection from the built-in dataset catalog
Inspect and filter the Collection
Fetch pixels as xarray, NumPy, and GeoPandas
Compute NDVI from the retrieved data
Setup
First, import Rasteret and define an area of interest:
from pathlib import Path
import rasteret
from shapely.geometry import Polygon
# Define a small AOI over Bengaluru, India
aoi = Polygon([
( 77.55 , 13.01 ),
( 77.58 , 13.01 ),
( 77.58 , 13.08 ),
( 77.55 , 13.08 ),
( 77.55 , 13.01 ),
])
Coordinates are in longitude, latitude (EPSG:4326). The polygon defines a ~3.3 km × 7.8 km area.
Step 1: Build a Collection
build() picks a dataset from the catalog, queries STAC, parses COG headers, and caches everything as Parquet:
collection = rasteret.build(
"earthsearch/sentinel-2-l2a" , # Dataset ID from catalog
name = "bangalore_jan" , # Logical name for this collection
bbox = aoi.bounds, # (minx, miny, maxx, maxy)
date_range = ( "2024-01-01" , "2024-01-31" ),
)
print (collection)
# Output: Collection('bangalore_jan', source='sentinel-2-l2a', bands=13, records=8, crs=32643)
First Run
Rasteret queries the STAC API, finds matching scenes, parses COG headers for all bands, and writes a Parquet index to ~/rasteret_workspace/bangalore_jan_*/. Time: 3-10 seconds depending on scene count
Subsequent Runs
The Parquet index is loaded from disk in milliseconds. No STAC query, no header parsing. Time: <100 ms
What Just Happened?
✅ Queried Earth Search STAC API for Sentinel-2 L2A scenes
✅ Parsed COG headers for all 13 bands (B01-B12, SCL)
✅ Cached metadata in ~/rasteret_workspace/bangalore_jan_sentinel-2-l2a_2024-01-01_2024-01-31_stac/
✅ Created a Collection object ready for filtering and reads
Step 2: Inspect the Collection
Explore the Collection’s metadata:
# Basic info
print ( f "Name: { collection.name } " )
print ( f "Bands: { collection.bands } " )
print ( f "Records: { len (collection) } " )
print ( f "CRS: { collection.crs } " )
# Output:
# Name: bangalore_jan
# Bands: ['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'SCL']
# Records: 8
# CRS: EPSG:32643
View Collection as DataFrame
import pandas as pd
# Convert to pandas for inspection
df = collection.dataset.to_table().to_pandas()
print (df[[ "id" , "datetime" , "cloud_cover" ]].head())
id datetime cloud_cover
0 S2B_32643_20240103_0_L2A 2024 - 0 1 - 0 3 12.4
1 S2A_32643_20240108_0_L2A 2024 - 0 1 - 0 8 45.2
2 S2B_32643_20240113_0_L2A 2024 - 0 1 - 13 8.1
3 S2A_32643_20240118_0_L2A 2024 - 0 1 - 18 67.9
4 S2B_32643_20240123_0_L2A 2024 - 0 1 - 23 3.2
Step 3: Filter Scenes
Filter the Collection without any network calls (pure in-memory filtering):
# Filter by cloud cover and date
filtered = collection.subset(
cloud_cover_lt = 20 ,
date_range = ( "2024-01-10" , "2024-01-31" ),
)
print ( f "Original: { len (collection) } scenes" )
print ( f "Filtered: { len (filtered) } scenes" )
# Output:
# Original: 8 scenes
# Filtered: 3 scenes
subset() returns a new Collection object with the filtered dataset. The original Collection is unchanged.
Available Filters
Parameter Description Example cloud_cover_ltMaximum cloud cover (%) cloud_cover_lt=15date_rangeTemporal window date_range=("2024-01", "2024-06")bboxSpatial bounds bbox=(west, south, east, north)splitFilter by split column value split="train"split_columnCustom split column name split_column="dataset_split"
For complex queries, use PyArrow expressions:
import pyarrow.compute as pc
import pyarrow.dataset as ds
# Custom filter: cloud cover < 10 OR date after Jan 20
expr = (
(ds.field( "cloud_cover" ) < 10 ) |
(ds.field( "datetime" ) >= pc.strptime( "2024-01-20" , "%Y-%m- %d " , "us" ))
)
filtered = collection.where(expr)
Step 4: Fetch Pixels
Now fetch actual pixel data for your AOI. Rasteret reads only the tiles that intersect your geometry.
As xarray Dataset
ds = filtered.get_xarray(
geometries = [aoi],
bands = [ "B04" , "B08" ], # Red and NIR
)
print (ds)
< xarray.Dataset >
Dimensions: (scene: 3 , y: 867 , x: 334 )
Coordinates:
* scene (scene) < U24 'S2B_32643_20240113_0_L2A' ...
* y (y) float64 1.446e+06 1.446e+06 ... 1.437e+06
* x (x) float64 6.593e+05 6.593e+05 ... 6.627e+05
Data variables:
B04 (scene, y, x) uint16 ...
B08 (scene, y, x) uint16 ...
Attributes:
crs: EPSG : 32643
Features:
Spatial coordinates (x, y) in native CRS
Scene dimension for temporal stacking
CRS metadata for geospatial operations
Lazy evaluation (data loaded on access)
As NumPy Array
arr = filtered.get_numpy(
geometries = [aoi],
bands = [ "B04" , "B08" ],
)
print ( f "Shape: { arr.shape } " ) # (N, C, H, W)
print ( f "Dtype: { arr.dtype } " ) # uint16 (native)
# Output:
# Shape: (3, 2, 867, 334)
# Dtype: uint16
Shape dimensions:
N = number of scenes (3)
C = number of bands (2: B04, B08)
H = height in pixels (867)
W = width in pixels (334)
Rasteret preserves native dtypes. Sentinel-2 uint16 data stays uint16—no unnecessary float32 promotion unless you use xarray with NaN filling.
As GeoPandas DataFrame
gdf = filtered.get_gdf(
geometries = [aoi],
bands = [ "B04" , "B08" ],
)
print (gdf[[ "id" , "datetime" , "B04" , "B08" ]].head())
id datetime B04 \
0 S2B_32643_20240113_0_L2A 2024 - 0 1 - 13 [[ 1234 , 1245 , ... ], [ 1256 , ... ]]
1 S2A_32643_20240118_0_L2A 2024 - 0 1 - 18 [[ 1198 , 1207 , ... ], [ 1221 , ... ]]
2 S2B_32643_20240123_0_L2A 2024 - 0 1 - 23 [[ 1276 , 1289 , ... ], [ 1301 , ... ]]
B08
0 [[ 2145 , 2178 , ... ], [ 2201 , ... ]]
1 [[ 2098 , 2134 , ... ], [ 2156 , ... ]]
2 [[ 2234 , 2267 , ... ], [ 2289 , ... ]]
Use cases:
One row per geometry-scene pair
Band arrays stored as columns
Join with external metadata
Export to GeoPackage or Parquet
Step 5: Compute NDVI
Now compute NDVI from the fetched data:
# Convert to float and compute NDVI
nir = ds[ "B08" ].astype( float )
red = ds[ "B04" ].astype( float )
ndvi = (nir - red) / (nir + red)
# Plot time series
ndvi.mean( dim = [ "y" , "x" ]).plot()
# Channels follow requested band order: [B04, B08]
red = arr[:, 0 ].astype( "float32" )
nir = arr[:, 1 ].astype( "float32" )
ndvi = (nir - red) / (nir + red + 1e-6 )
print ( f "Mean NDVI: { ndvi.mean() :.3f} " )
# Output: Mean NDVI: 0.342
Step 6: Share Your Collection
Collections are portable—export and share with colleagues:
# Export to a directory
collection.export( "./shared_collections/bangalore_jan/" )
# On another machine, load directly
shared = rasteret.load( "./shared_collections/bangalore_jan/" )
print (shared.bands)
# Output: ['B01', 'B02', ..., 'SCL']
The exported Parquet files contain all metadata and COG headers. No STAC queries or header parsing needed on the receiving end.
Next Steps
TorchGeo Integration Use Rasteret with PyTorch DataLoader and TorchGeo samplers
Build from Parquet Ingest custom Parquet tables with COG URLs
Dataset Catalog Explore built-in datasets and add your own
API Reference Complete function signatures and parameters
Complete Example
Here’s the full workflow in one script:
from pathlib import Path
import rasteret
from shapely.geometry import Polygon
# 1. Define AOI
aoi = Polygon([
( 77.55 , 13.01 ),
( 77.58 , 13.01 ),
( 77.58 , 13.08 ),
( 77.55 , 13.08 ),
( 77.55 , 13.01 ),
])
# 2. Build Collection (cached after first run)
collection = rasteret.build(
"earthsearch/sentinel-2-l2a" ,
name = "bangalore_jan" ,
bbox = aoi.bounds,
date_range = ( "2024-01-01" , "2024-01-31" ),
)
print ( f "Built collection: { collection.name } " )
print ( f "Found { len (collection) } scenes" )
# 3. Filter by cloud cover
filtered = collection.subset( cloud_cover_lt = 20 )
print ( f "After cloud filter: { len (filtered) } scenes" )
# 4. Fetch pixels
ds = filtered.get_xarray( geometries = [aoi], bands = [ "B04" , "B08" ])
arr = filtered.get_numpy( geometries = [aoi], bands = [ "B04" , "B08" ])
gdf = filtered.get_gdf( geometries = [aoi], bands = [ "B04" , "B08" ])
# 5. Compute NDVI
nir = ds[ "B08" ].astype( float )
red = ds[ "B04" ].astype( float )
ndvi = (nir - red) / (nir + red)
print ( f "Mean NDVI: { ndvi.mean().values :.3f} " )
print ( f "NumPy shape: { arr.shape } " )
print ( f "GeoPandas rows: { len (gdf) } " )
Run it:
Free datasets only: This example uses Sentinel-2 on Earth Search, which is free and requires no authentication. For requester-pays datasets (Landsat, NAIP), install rasteret[aws] and configure AWS credentials.
Troubleshooting
”No scenes found”
Check that your bbox and date_range intersect actual data:
# Verify bbox format: (minx, miny, maxx, maxy)
print (aoi.bounds)
# Try a wider date range
collection = rasteret.build(
"earthsearch/sentinel-2-l2a" ,
name = "bangalore" ,
bbox = aoi.bounds,
date_range = ( "2023-01-01" , "2024-12-31" ), # Full 2 years
)
“Collection already exists” when you want to rebuild
Use force=True to rebuild:
collection = rasteret.build(
"earthsearch/sentinel-2-l2a" ,
name = "bangalore_jan" ,
bbox = aoi.bounds,
date_range = ( "2024-01-01" , "2024-01-31" ),
force = True , # Rebuild even if cache exists
)
Memory issues with large collections
For collections with 1000+ scenes, use PyArrow Dataset (lazy) instead of in-memory tables:
# Collections use lazy datasets by default
print ( type (collection.dataset)) # <class 'pyarrow.dataset.FileSystemDataset'>
# Avoid calling .to_table() unless necessary
# ❌ Bad: loads everything into memory
df = collection.dataset.to_table().to_pandas()
# ✅ Good: lazy evaluation
ds = collection.get_xarray( geometries = [aoi], bands = [ "B04" ])
Summary
You’ve learned:
✅ How to build a Collection from the built-in catalog
✅ How to filter scenes without network calls
✅ How to fetch pixels in xarray, NumPy, and GeoPandas formats
✅ How to compute indices like NDVI
✅ How to share Collections as portable Parquet files
Rasteret handles STAC queries, COG header parsing, and parallel tile I/O so you can focus on your analysis or ML pipeline.