TorchGeo provides a comprehensive collection of datasets for geospatial machine learning. These datasets are divided into two main categories based on whether they contain geospatial metadata.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/torchgeo/torchgeo/llms.txt
Use this file to discover all available pages before exploring further.
Dataset Categories
Geospatial Datasets
Datasets with coordinate information that can be spatially indexed and combined
Non-Geospatial Datasets
Benchmark datasets with pre-defined image chips for various computer vision tasks
Key Differences
Geospatial Datasets
Geospatial datasets (GeoDataset) contain rich geospatial metadata including:
- Coordinates (latitude, longitude)
- Coordinate Reference System (CRS)
- Resolution
- Temporal information
Non-Geospatial Datasets
Non-geospatial datasets (NonGeoDataset) are pre-chipped benchmark datasets without coordinate information:
Common Patterns
Combining Geospatial Datasets
TorchGeo provides two operators for combining geospatial datasets: Intersection (&): Samples must exist in both datasets
|): Samples can exist in either dataset
Sampling Strategies
For geospatial datasets, use samplers to generate random queries:Transforms
All datasets support transforms for data augmentation:Common Parameters
Most datasets share these common parameters:Root directory where dataset is stored (for NonGeoDatasets)
One or more root directories to search or files to load (for GeoDatasets)
Function to transform samples after loading
If True, download dataset if not found (for benchmark datasets)
If True, verify file integrity using MD5 checksums
Sample Format
All datasets return samples as dictionaries with standardized keys: Image datasets:Next Steps
Geospatial Datasets
Explore RasterDataset, VectorDataset, and other geospatial base classes
Non-Geospatial Datasets
Browse benchmark datasets for classification, segmentation, and detection