TorchVision’s dataset module gives you ready-to-useDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/pytorch/vision/llms.txt
Use this file to discover all available pages before exploring further.
torch.utils.data.Dataset implementations for the most widely-used computer vision benchmarks. Every dataset follows a consistent interface that slots directly into torch.utils.data.DataLoader, making it straightforward to swap one benchmark for another without changing your training loop.
VisionDataset Base Class
All datasets inherit fromtorchvision.datasets.VisionDataset, which extends torch.utils.data.Dataset and enforces a consistent transform contract.
Transform parameters
| Parameter | Applied to | Notes |
|---|---|---|
transform | Input image only | Receives a PIL Image (or Tensor depending on loader), returns transformed image |
target_transform | Target/label only | Receives the raw label, returns transformed label |
transforms | (image, target) jointly | Receives and returns a (image, target) pair — mutually exclusive with the two above |
transforms and the transform/target_transform pair are mutually exclusive. Passing both raises a ValueError.Generic Folder Loaders
When your data is already organized into class subdirectories, you don’t need a specialized dataset class.DatasetFolder
DatasetFolder scans a root directory for class subdirectories and builds a flat list of (sample_path, class_index) tuples. It accepts any file type via an extensions allow-list or a custom is_valid_file predicate.
| Attribute | Type | Description |
|---|---|---|
classes | list[str] | Sorted list of class folder names |
class_to_idx | dict[str, int] | Maps class name → integer label |
samples | list[tuple[str, int]] | All (path, class_index) pairs |
targets | list[int] | Class index for every sample |
ImageFolder
ImageFolder is a thin specialization of DatasetFolder pre-configured for common image extensions (.jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp).
Organize your images
Create one subdirectory per class under your root directory. Subdirectory names become the class labels.
Using wrap_dataset_for_transforms_v2
The torchvision.transforms.v2 API can operate on richer tensor types — BoundingBoxes, Mask, etc. — but many built-in datasets return plain PIL Images and dicts. The wrap_dataset_for_transforms_v2 helper adapts any existing dataset so that its __getitem__ returns those typed tensors automatically.
Using Datasets with DataLoader
All TorchVision datasets are standardtorch.utils.data.Dataset objects, so you can use the full PyTorch DataLoader API.
Dataset Categories
TorchVision ships with datasets across six task categories:| Category | Representative datasets | Page |
|---|---|---|
| Image Classification | CIFAR-10/100, ImageNet, MNIST, Flowers102, Food101, STL10 … | Classification |
| Object Detection | CocoDetection, VOCDetection, Kitti, WIDERFace … | Detection & Segmentation |
| Semantic Segmentation | VOCSegmentation, Cityscapes, SBDataset … | Detection & Segmentation |
| Video / Action Recognition | Kinetics (400/600/700), HMDB51, UCF101, MovingMNIST | Video & Flow |
| Optical Flow | Sintel, KittiFlow, FlyingChairs, FlyingThings3D, HD1K | Video & Flow |
| Stereo Matching | Kitti2012Stereo, Kitti2015Stereo, Middlebury2014Stereo … | Video & Flow |
Classification
CIFAR, ImageNet, MNIST, fine-grained recognition, scene datasets, and more.
Detection & Segmentation
COCO, Pascal VOC, Cityscapes, Kitti, and others with bounding box or mask targets.
Video & Flow
Kinetics, HMDB51, UCF101, optical flow, and stereo disparity datasets.