Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/terrafloww/rasteret/llms.txt

Use this file to discover all available pages before exploring further.

Quickstart

This guide walks you through the complete Rasteret workflow: building a Collection from STAC, filtering scenes, and fetching pixels in multiple formats.
Time to complete: ~5 minutes
Prerequisites: Python 3.12+, rasteret installed

Overview

You’ll learn how to:
  1. Build a Collection from the built-in dataset catalog
  2. Inspect and filter the Collection
  3. Fetch pixels as xarray, NumPy, and GeoPandas
  4. Compute NDVI from the retrieved data

Setup

First, import Rasteret and define an area of interest:
from pathlib import Path
import rasteret
from shapely.geometry import Polygon

# Define a small AOI over Bengaluru, India
aoi = Polygon([
    (77.55, 13.01),
    (77.58, 13.01),
    (77.58, 13.08),
    (77.55, 13.08),
    (77.55, 13.01),
])
Coordinates are in longitude, latitude (EPSG:4326). The polygon defines a ~3.3 km × 7.8 km area.

Step 1: Build a Collection

build() picks a dataset from the catalog, queries STAC, parses COG headers, and caches everything as Parquet:
collection = rasteret.build(
    "earthsearch/sentinel-2-l2a",  # Dataset ID from catalog
    name="bangalore_jan",           # Logical name for this collection
    bbox=aoi.bounds,                # (minx, miny, maxx, maxy)
    date_range=("2024-01-01", "2024-01-31"),
)

print(collection)
# Output: Collection('bangalore_jan', source='sentinel-2-l2a', bands=13, records=8, crs=32643)
1

First Run

Rasteret queries the STAC API, finds matching scenes, parses COG headers for all bands, and writes a Parquet index to ~/rasteret_workspace/bangalore_jan_*/.Time: 3-10 seconds depending on scene count
2

Subsequent Runs

The Parquet index is loaded from disk in milliseconds. No STAC query, no header parsing.Time: <100 ms

What Just Happened?

  • ✅ Queried Earth Search STAC API for Sentinel-2 L2A scenes
  • ✅ Parsed COG headers for all 13 bands (B01-B12, SCL)
  • ✅ Cached metadata in ~/rasteret_workspace/bangalore_jan_sentinel-2-l2a_2024-01-01_2024-01-31_stac/
  • ✅ Created a Collection object ready for filtering and reads

Step 2: Inspect the Collection

Explore the Collection’s metadata:
# Basic info
print(f"Name: {collection.name}")
print(f"Bands: {collection.bands}")
print(f"Records: {len(collection)}")
print(f"CRS: {collection.crs}")

# Output:
# Name: bangalore_jan
# Bands: ['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'SCL']
# Records: 8
# CRS: EPSG:32643

View Collection as DataFrame

import pandas as pd

# Convert to pandas for inspection
df = collection.dataset.to_table().to_pandas()
print(df[["id", "datetime", "cloud_cover"]].head())
                                          id            datetime  cloud_cover
0  S2B_32643_20240103_0_L2A               2024-01-03         12.4
1  S2A_32643_20240108_0_L2A               2024-01-08         45.2
2  S2B_32643_20240113_0_L2A               2024-01-13          8.1
3  S2A_32643_20240118_0_L2A               2024-01-18         67.9
4  S2B_32643_20240123_0_L2A               2024-01-23          3.2

Step 3: Filter Scenes

Filter the Collection without any network calls (pure in-memory filtering):
# Filter by cloud cover and date
filtered = collection.subset(
    cloud_cover_lt=20,
    date_range=("2024-01-10", "2024-01-31"),
)

print(f"Original: {len(collection)} scenes")
print(f"Filtered: {len(filtered)} scenes")
# Output:
# Original: 8 scenes
# Filtered: 3 scenes
subset() returns a new Collection object with the filtered dataset. The original Collection is unchanged.

Available Filters

ParameterDescriptionExample
cloud_cover_ltMaximum cloud cover (%)cloud_cover_lt=15
date_rangeTemporal windowdate_range=("2024-01", "2024-06")
bboxSpatial boundsbbox=(west, south, east, north)
splitFilter by split column valuesplit="train"
split_columnCustom split column namesplit_column="dataset_split"
For complex queries, use PyArrow expressions:
import pyarrow.compute as pc
import pyarrow.dataset as ds

# Custom filter: cloud cover < 10 OR date after Jan 20
expr = (
    (ds.field("cloud_cover") < 10) | 
    (ds.field("datetime") >= pc.strptime("2024-01-20", "%Y-%m-%d", "us"))
)
filtered = collection.where(expr)

Step 4: Fetch Pixels

Now fetch actual pixel data for your AOI. Rasteret reads only the tiles that intersect your geometry.

As xarray Dataset

ds = filtered.get_xarray(
    geometries=[aoi],
    bands=["B04", "B08"],  # Red and NIR
)

print(ds)
<xarray.Dataset>
Dimensions:  (scene: 3, y: 867, x: 334)
Coordinates:
  * scene    (scene) <U24 'S2B_32643_20240113_0_L2A' ...
  * y        (y) float64 1.446e+06 1.446e+06 ... 1.437e+06
  * x        (x) float64 6.593e+05 6.593e+05 ... 6.627e+05
Data variables:
    B04      (scene, y, x) uint16 ...
    B08      (scene, y, x) uint16 ...
Attributes:
    crs:      EPSG:32643
Features:
  • Spatial coordinates (x, y) in native CRS
  • Scene dimension for temporal stacking
  • CRS metadata for geospatial operations
  • Lazy evaluation (data loaded on access)

As NumPy Array

arr = filtered.get_numpy(
    geometries=[aoi],
    bands=["B04", "B08"],
)

print(f"Shape: {arr.shape}")   # (N, C, H, W)
print(f"Dtype: {arr.dtype}")   # uint16 (native)
# Output:
# Shape: (3, 2, 867, 334)
# Dtype: uint16
Shape dimensions:
  • N = number of scenes (3)
  • C = number of bands (2: B04, B08)
  • H = height in pixels (867)
  • W = width in pixels (334)
Rasteret preserves native dtypes. Sentinel-2 uint16 data stays uint16—no unnecessary float32 promotion unless you use xarray with NaN filling.

As GeoPandas DataFrame

gdf = filtered.get_gdf(
    geometries=[aoi],
    bands=["B04", "B08"],
)

print(gdf[["id", "datetime", "B04", "B08"]].head())
                      id            datetime                                B04  \
0  S2B_32643_20240113_0_L2A 2024-01-13  [[1234, 1245, ...], [1256, ...]]   
1  S2A_32643_20240118_0_L2A 2024-01-18  [[1198, 1207, ...], [1221, ...]]   
2  S2B_32643_20240123_0_L2A 2024-01-23  [[1276, 1289, ...], [1301, ...]]   

                                B08  
0  [[2145, 2178, ...], [2201, ...]]  
1  [[2098, 2134, ...], [2156, ...]]  
2  [[2234, 2267, ...], [2289, ...]]  
Use cases:
  • One row per geometry-scene pair
  • Band arrays stored as columns
  • Join with external metadata
  • Export to GeoPackage or Parquet

Step 5: Compute NDVI

Now compute NDVI from the fetched data:
# Convert to float and compute NDVI
nir = ds["B08"].astype(float)
red = ds["B04"].astype(float)
ndvi = (nir - red) / (nir + red)

# Plot time series
ndvi.mean(dim=["y", "x"]).plot()

Step 6: Share Your Collection

Collections are portable—export and share with colleagues:
# Export to a directory
collection.export("./shared_collections/bangalore_jan/")

# On another machine, load directly
shared = rasteret.load("./shared_collections/bangalore_jan/")
print(shared.bands)
# Output: ['B01', 'B02', ..., 'SCL']
The exported Parquet files contain all metadata and COG headers. No STAC queries or header parsing needed on the receiving end.

Next Steps

TorchGeo Integration

Use Rasteret with PyTorch DataLoader and TorchGeo samplers

Build from Parquet

Ingest custom Parquet tables with COG URLs

Dataset Catalog

Explore built-in datasets and add your own

API Reference

Complete function signatures and parameters

Complete Example

Here’s the full workflow in one script:
quickstart.py
from pathlib import Path
import rasteret
from shapely.geometry import Polygon

# 1. Define AOI
aoi = Polygon([
    (77.55, 13.01),
    (77.58, 13.01),
    (77.58, 13.08),
    (77.55, 13.08),
    (77.55, 13.01),
])

# 2. Build Collection (cached after first run)
collection = rasteret.build(
    "earthsearch/sentinel-2-l2a",
    name="bangalore_jan",
    bbox=aoi.bounds,
    date_range=("2024-01-01", "2024-01-31"),
)

print(f"Built collection: {collection.name}")
print(f"Found {len(collection)} scenes")

# 3. Filter by cloud cover
filtered = collection.subset(cloud_cover_lt=20)
print(f"After cloud filter: {len(filtered)} scenes")

# 4. Fetch pixels
ds = filtered.get_xarray(geometries=[aoi], bands=["B04", "B08"])
arr = filtered.get_numpy(geometries=[aoi], bands=["B04", "B08"])
gdf = filtered.get_gdf(geometries=[aoi], bands=["B04", "B08"])

# 5. Compute NDVI
nir = ds["B08"].astype(float)
red = ds["B04"].astype(float)
ndvi = (nir - red) / (nir + red)

print(f"Mean NDVI: {ndvi.mean().values:.3f}")
print(f"NumPy shape: {arr.shape}")
print(f"GeoPandas rows: {len(gdf)}")
Run it:
python quickstart.py
Free datasets only: This example uses Sentinel-2 on Earth Search, which is free and requires no authentication. For requester-pays datasets (Landsat, NAIP), install rasteret[aws] and configure AWS credentials.

Troubleshooting

”No scenes found”

Check that your bbox and date_range intersect actual data:
# Verify bbox format: (minx, miny, maxx, maxy)
print(aoi.bounds)

# Try a wider date range
collection = rasteret.build(
    "earthsearch/sentinel-2-l2a",
    name="bangalore",
    bbox=aoi.bounds,
    date_range=("2023-01-01", "2024-12-31"),  # Full 2 years
)

“Collection already exists” when you want to rebuild

Use force=True to rebuild:
collection = rasteret.build(
    "earthsearch/sentinel-2-l2a",
    name="bangalore_jan",
    bbox=aoi.bounds,
    date_range=("2024-01-01", "2024-01-31"),
    force=True,  # Rebuild even if cache exists
)

Memory issues with large collections

For collections with 1000+ scenes, use PyArrow Dataset (lazy) instead of in-memory tables:
# Collections use lazy datasets by default
print(type(collection.dataset))  # <class 'pyarrow.dataset.FileSystemDataset'>

# Avoid calling .to_table() unless necessary
# ❌ Bad: loads everything into memory
df = collection.dataset.to_table().to_pandas()

# ✅ Good: lazy evaluation
ds = collection.get_xarray(geometries=[aoi], bands=["B04"])

Summary

You’ve learned: ✅ How to build a Collection from the built-in catalog
✅ How to filter scenes without network calls
✅ How to fetch pixels in xarray, NumPy, and GeoPandas formats
✅ How to compute indices like NDVI
✅ How to share Collections as portable Parquet files
Rasteret handles STAC queries, COG header parsing, and parallel tile I/O so you can focus on your analysis or ML pipeline.

Build docs developers (and LLMs) love