Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/torchgeo/torchgeo/llms.txt

Use this file to discover all available pages before exploring further.

TorchGeo provides a comprehensive collection of datasets for geospatial machine learning. These datasets are divided into two main categories based on whether they contain geospatial metadata.

Dataset Categories

Geospatial Datasets

Datasets with coordinate information that can be spatially indexed and combined

Non-Geospatial Datasets

Benchmark datasets with pre-defined image chips for various computer vision tasks

Key Differences

Geospatial Datasets

Geospatial datasets (GeoDataset) contain rich geospatial metadata including:
  • Coordinates (latitude, longitude)
  • Coordinate Reference System (CRS)
  • Resolution
  • Temporal information
This metadata enables powerful spatial operations:
from torchgeo.datasets import Landsat8, CDL

# Load geospatial datasets
landsat = Landsat8(paths='data/landsat')
cdl = CDL(paths='data/cdl')

# Combine using spatial intersection
dataset = landsat & cdl

# Query by bounding box and time
sample = dataset[xmin:xmax:xres, ymin:ymax:yres, tmin:tmax]

Non-Geospatial Datasets

Non-geospatial datasets (NonGeoDataset) are pre-chipped benchmark datasets without coordinate information:
from torchgeo.datasets import EuroSAT
from torch.utils.data import DataLoader

# Load benchmark dataset
dataset = EuroSAT(root='data/eurosat', split='train', download=True)

# Use with standard PyTorch DataLoader
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

for sample in dataloader:
    image = sample['image']  # [B, C, H, W]
    label = sample['label']  # [B]

Common Patterns

Combining Geospatial Datasets

TorchGeo provides two operators for combining geospatial datasets: Intersection (&): Samples must exist in both datasets
# Combine image and labels from same location
dataset = imagery & labels
Union (|): Samples can exist in either dataset
# Combine data from different sensors or locations
dataset = landsat7 | landsat8

Sampling Strategies

For geospatial datasets, use samplers to generate random queries:
from torchgeo.samplers import RandomGeoSampler
from torch.utils.data import DataLoader

sampler = RandomGeoSampler(dataset, size=256, length=1000)
dataloader = DataLoader(dataset, sampler=sampler, batch_size=4)

Transforms

All datasets support transforms for data augmentation:
import kornia.augmentation as K
from torchgeo.transforms import AugmentationSequential

transforms = AugmentationSequential(
    K.RandomHorizontalFlip(p=0.5),
    K.RandomVerticalFlip(p=0.5),
    data_keys=['image', 'mask']
)

dataset = MyDataset(paths='data', transforms=transforms)

Common Parameters

Most datasets share these common parameters:
root
str | PathLike
default:"'data'"
Root directory where dataset is stored (for NonGeoDatasets)
paths
str | PathLike | Iterable[str | PathLike]
default:"'data'"
One or more root directories to search or files to load (for GeoDatasets)
transforms
Callable[[Sample], Sample] | None
default:"None"
Function to transform samples after loading
download
bool
default:"False"
If True, download dataset if not found (for benchmark datasets)
checksum
bool
default:"False"
If True, verify file integrity using MD5 checksums

Sample Format

All datasets return samples as dictionaries with standardized keys: Image datasets:
{
    'image': Tensor,      # [C, H, W] or [T, C, H, W]
    'bounds': Tensor,     # Spatiotemporal bounds (GeoDatasets only)
    'transform': Tensor   # Affine transform matrix (GeoDatasets only)
}
Segmentation datasets:
{
    'image': Tensor,  # [C, H, W]
    'mask': Tensor    # [H, W] or [C, H, W]
}
Classification datasets:
{
    'image': Tensor,  # [C, H, W]
    'label': Tensor   # Scalar or [1]
}
Object detection datasets:
{
    'image': Tensor,      # [C, H, W]
    'bbox_xyxy': Tensor,  # [N, 4]
    'label': Tensor       # [N]
}

Next Steps

Geospatial Datasets

Explore RasterDataset, VectorDataset, and other geospatial base classes

Non-Geospatial Datasets

Browse benchmark datasets for classification, segmentation, and detection

Build docs developers (and LLMs) love