Documentation Index
Fetch the complete documentation index at: https://mintlify.com/terrafloww/rasteret/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Rasteret integrates seamlessly with xarray for scientific data analysis. The get_xarray() method converts COG tiles into labeled xarray Datasets with:
- Named dimensions (
x, y, band)
- Coordinate arrays with spatial/spectral metadata
- CF-compliant CRS encoding (spatial_ref coordinate)
- Automatic multi-CRS reprojection and mosaicking
Basic Usage
Loading Data
Convert any collection region to an xarray Dataset:
import rasteret
# Build or load a collection
collection = rasteret.build_from_stac(
name="bangalore",
stac_api="https://earth-search.aws.element84.com/v1",
collection="sentinel-2-l2a",
bbox=(77.55, 13.01, 77.58, 13.08),
date_range=("2024-01-01", "2024-01-31"),
)
# Load RGB bands as xarray Dataset
ds = collection.get_xarray(
geometries=(77.55, 13.01, 77.58, 13.08),
bands=["B04", "B03", "B02"],
)
print(ds)
# <xarray.Dataset>
# Dimensions: (band: 3, y: 768, x: 683)
# Coordinates:
# * band (band) <U3 'B04' 'B03' 'B02'
# * y (y) float64 ...
# * x (x) float64 ...
# spatial_ref int64 32643
# Data variables:
# B04 (band, y, x) uint16 ...
# B03 (band, y, x) uint16 ...
# B02 (band, y, x) uint16 ...
Key Parameters:
geometries: Bounding box (minx, miny, maxx, maxy), Shapely geometry, or GeoJSON
bands: List of band codes to load
max_concurrent: Number of parallel HTTP requests (default: 50)
Dataset Structure
The returned xarray Dataset contains:
# Dimensions
ds.dims
# {'band': 3, 'y': 768, 'x': 683}
# Coordinates
ds.coords
# Coordinates:
# * band (band) <U3 'B04' 'B03' 'B02'
# * y (y) float64 1.444e+06 1.444e+06 ... 1.436e+06
# * x (x) float64 7.755e+05 7.755e+05 ... 7.823e+05
# spatial_ref int64 32643
# Data variables (one per band)
ds.data_vars
# Data variables:
# B04 (band, y, x) uint16 ...
# B03 (band, y, x) uint16 ...
# B02 (band, y, x) uint16 ...
# CRS metadata
ds.spatial_ref.attrs
# {'crs_wkt': 'PROJCS["WGS 84 / UTM zone 43N", ...]',
# 'GeoTransform': '775500.0 10.0 0.0 1444000.0 0.0 -10.0'}
Analysis Workflows
Band Math (NDVI)
Perform normalized difference vegetation index calculation:
# Load NIR and Red bands
ds = collection.get_xarray(
geometries=(77.55, 13.01, 77.58, 13.08),
bands=["B08", "B04"], # NIR, Red
)
# Calculate NDVI
ndvi = (ds["B08"] - ds["B04"]) / (ds["B08"] + ds["B04"])
print(f"NDVI range: [{float(ndvi.min()):.3f}, {float(ndvi.max()):.3f}]")
# NDVI range: [-0.234, 0.876]
# Add as new variable
ds["NDVI"] = ndvi
Spectral Indices
Create derived products from multiple bands:
ds = collection.get_xarray(
geometries=(77.55, 13.01, 77.58, 13.08),
bands=["B08", "B04", "B03"], # NIR, Red, Green
)
# Enhanced Vegetation Index (EVI)
L = 1.0
C1 = 6.0
C2 = 7.5
evi = 2.5 * (
(ds["B08"] - ds["B04"]) /
(ds["B08"] + C1 * ds["B04"] - C2 * ds["B03"] + L)
)
ds["EVI"] = evi
# Green Chlorophyll Index (GCI)
gci = (ds["B08"] / ds["B03"]) - 1
ds["GCI"] = gci
Spatial Aggregation
Compute zonal statistics:
import numpy as np
ds = collection.get_xarray(
geometries=(77.55, 13.01, 77.58, 13.08),
bands=["B08", "B04"],
)
# Mean NDVI across entire region
ndvi = (ds["B08"] - ds["B04"]) / (ds["B08"] + ds["B04"])
mean_ndvi = float(ndvi.mean())
print(f"Mean NDVI: {mean_ndvi:.3f}")
# Standard deviation
std_ndvi = float(ndvi.std())
print(f"Std NDVI: {std_ndvi:.3f}")
# Percentiles
percentiles = np.percentile(ndvi.values, [25, 50, 75])
print(f"NDVI quartiles: {percentiles}")
Visualization
Integrate with matplotlib:
import matplotlib.pyplot as plt
ds = collection.get_xarray(
geometries=(77.55, 13.01, 77.58, 13.08),
bands=["B04", "B03", "B02"],
)
# Plot single band
fig, ax = plt.subplots(figsize=(10, 8))
ds["B04"].plot(ax=ax, cmap="gray")
ax.set_title("Sentinel-2 Band 4 (Red)")
plt.show()
# RGB composite (requires scaling to 0-1)
rgb = np.stack([
ds["B04"].values / 3000, # Red
ds["B03"].values / 3000, # Green
ds["B02"].values / 3000, # Blue
], axis=-1)
rgb = np.clip(rgb, 0, 1)
plt.figure(figsize=(12, 10))
plt.imshow(rgb)
plt.title("True Color Composite")
plt.axis("off")
plt.show()
Temporal Analysis
Analyze change over time:
import pandas as pd
# Build collection spanning multiple months
collection = rasteret.build_from_stac(
name="temporal-study",
stac_api="https://earth-search.aws.element84.com/v1",
collection="sentinel-2-l2a",
bbox=(77.55, 13.01, 77.58, 13.08),
date_range=("2024-01-01", "2024-06-30"),
)
# Load data for each month
monthly_ndvi = []
for month in pd.date_range("2024-01", "2024-06", freq="MS"):
start = month.strftime("%Y-%m-01")
end = (month + pd.DateOffset(months=1) - pd.DateOffset(days=1)).strftime("%Y-%m-%d")
subset = collection.subset(date_range=(start, end))
if len(subset) == 0:
continue
ds = subset.get_xarray(
geometries=(77.55, 13.01, 77.58, 13.08),
bands=["B08", "B04"],
cloud_cover_lt=20,
)
ndvi = (ds["B08"] - ds["B04"]) / (ds["B08"] + ds["B04"])
monthly_ndvi.append({
"date": month,
"mean_ndvi": float(ndvi.mean()),
"std_ndvi": float(ndvi.std()),
})
# Plot temporal trend
df = pd.DataFrame(monthly_ndvi)
plt.figure(figsize=(10, 6))
plt.plot(df["date"], df["mean_ndvi"], marker="o")
plt.fill_between(
df["date"],
df["mean_ndvi"] - df["std_ndvi"],
df["mean_ndvi"] + df["std_ndvi"],
alpha=0.3,
)
plt.xlabel("Date")
plt.ylabel("NDVI")
plt.title("Vegetation Index Over Time")
plt.grid(True)
plt.show()
Advanced Features
Multi-Geometry Queries
Load data for multiple polygons:
import shapely
# Define multiple study areas
geometries = [
shapely.box(77.55, 13.01, 77.57, 13.03), # Area 1
shapely.box(77.56, 13.04, 77.58, 13.06), # Area 2
]
ds = collection.get_xarray(
geometries=geometries,
bands=["B04", "B03", "B02"],
)
# Returns union of all geometries
Multi-CRS Collections
Automatically reproject mixed-CRS data:
# Collection spans multiple UTM zones
collection = rasteret.load("global-dataset")
print(f"CRS zones: {collection.epsg}") # [32643, 32644]
# Option 1: Auto-select most common CRS
ds = collection.get_xarray(
geometries=(77.0, 13.0, 78.0, 14.0),
bands=["B04", "B03", "B02"],
)
# Uses most common CRS (majority scenes)
# Option 2: Force specific target CRS
ds = collection.get_xarray(
geometries=(77.0, 13.0, 78.0, 14.0),
bands=["B04", "B03", "B02"],
target_crs=32643, # All data reprojected to this CRS
)
Filtering During Load
Combine spatial query with filters:
ds = collection.get_xarray(
geometries=(77.55, 13.01, 77.58, 13.08),
bands=["B04", "B03", "B02"],
cloud_cover_lt=10, # Only low-cloud scenes
date_range=("2024-03-01", "2024-03-31"),
)
Export to NetCDF
Save for later analysis:
ds = collection.get_xarray(
geometries=(77.55, 13.01, 77.58, 13.08),
bands=["B04", "B03", "B02"],
)
# Save to NetCDF
ds.to_netcdf("bangalore_rgb.nc")
# Reload later
import xarray as xr
ds_reloaded = xr.open_dataset("bangalore_rgb.nc")
Concurrent Requests
Tune parallelism for your network:
ds = collection.get_xarray(
geometries=(77.55, 13.01, 77.58, 13.08),
bands=["B04", "B03", "B02"],
max_concurrent=100, # Default: 50
)
# Higher values improve throughput but increase memory
Cloud Configuration
For requester-pays or private buckets:
from rasteret.cloud import CloudConfig
cloud_config = CloudConfig(
requester_pays=True,
region="us-west-2",
)
ds = collection.get_xarray(
geometries=(77.55, 13.01, 77.58, 13.08),
bands=["B4", "B5"],
cloud_config=cloud_config,
)
Backend Selection
Use native cloud storage clients:
try:
from obstore.store import S3Store
backend = S3Store.from_url("s3://bucket/prefix")
except ImportError:
backend = None
ds = collection.get_xarray(
geometries=(77.55, 13.01, 77.58, 13.08),
bands=["B04", "B03", "B02"],
backend=backend,
)
API Reference
Collection.get_xarray()
Defined in source/src/rasteret/core/collection.py:1098.
Signature:
def get_xarray(
self,
geometries: Any,
bands: list[str],
*,
max_concurrent: int = 50,
cloud_config: Any = None,
data_source: str | None = None,
backend: Any = None,
target_crs: int | None = None,
**filters: Any,
) -> xr.Dataset
Parameters:
geometries: Bounding box tuple, Shapely geometry, GeoJSON dict, or Arrow array
bands: List of band codes (e.g., ["B04", "B03", "B02"])
max_concurrent: Maximum concurrent HTTP requests
cloud_config: Optional CloudConfig for authentication/URL rewriting
data_source: Override inferred data source
backend: Optional StorageBackend instance
target_crs: EPSG code for reprojection
**filters: Additional kwargs passed to Collection.subset()
Returns:
xarray.Dataset: Multi-band array with spatial coordinates
Coordinate Reference System
CRS information is stored in the spatial_ref coordinate:
ds.spatial_ref.attrs["crs_wkt"] # WKT2 string
ds.spatial_ref.attrs["GeoTransform"] # GDAL GeoTransform
ds.spatial_ref.attrs["spatial_ref"] # PROJ.4 string (if available)
Examples
See the complete example at source/examples/landsat_xarray.py:1 for a full Landsat workflow.
Common Patterns
Masking by Quality Band
Use scene quality layers:
# Load data + quality band
ds = collection.get_xarray(
geometries=(77.55, 13.01, 77.58, 13.08),
bands=["B04", "B03", "B02", "SCL"], # Scene Classification Layer
)
# Mask clouds (SCL values 8, 9, 10)
import xarray as xr
cloud_mask = xr.where(
(ds["SCL"] == 8) | (ds["SCL"] == 9) | (ds["SCL"] == 10),
True,
False,
)
# Apply mask to all bands
masked_ds = ds[["B04", "B03", "B02"]].where(~cloud_mask)
Resampling to Common Grid
Align different resolution bands:
# Load 10m and 20m bands
ds = collection.get_xarray(
geometries=(77.55, 13.01, 77.58, 13.08),
bands=["B04", "B05"], # 10m, 20m
)
# Resample B05 to B04 grid
ds["B05_resampled"] = ds["B05"].interp_like(ds["B04"])
Troubleshooting
Empty Dataset
Error: No data returned from get_xarray()
Solution: Check collection coverage:
print(collection.bounds) # Should overlap with geometries
print(len(collection)) # Should be > 0
CRS Mismatch
Warning: Collection spans multiple CRS zones
Solution: Specify target_crs:
ds = collection.get_xarray(
geometries=(77.55, 13.01, 77.58, 13.08),
bands=["B04", "B03", "B02"],
target_crs=32643,
)
Memory Issues
Error: Out of memory loading large regions
Solution: Load smaller spatial extents or fewer bands:
# Split into smaller tiles
for x in range(77, 78, 0.1):
for y in range(13, 14, 0.1):
bbox = (x, y, x + 0.1, y + 0.1)
ds = collection.get_xarray(geometries=bbox, bands=["B04"])
# Process tile