Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pytorch/vision/llms.txt

Use this file to discover all available pages before exploring further.

TorchVision provides more than 25 image-classification datasets covering digit recognition, natural scenes, fine-grained categories, satellite imagery, and large-scale benchmarks. Every dataset returns (image, label) tuples and integrates seamlessly with torch.utils.data.DataLoader. Most datasets support download=True for automatic setup; a few — most notably ImageNet — require manual download due to access restrictions.
download=True works for the majority of datasets. ImageNet, PCAM (manual Google Drive download), and LFW (no longer auto-downloadable) require you to obtain the files yourself and place them in the root directory before constructing the dataset.

Quick Start

from torchvision.datasets import CIFAR10
import torchvision.transforms.v2 as T
import torch

transform = T.Compose([
    T.ToImage(),
    T.ToDtype(torch.float32, scale=True),
    T.Normalize(mean=[0.4914, 0.4822, 0.4465],
                std=[0.2023, 0.1994, 0.2010]),
])

train_data = CIFAR10(root="./data", train=True,  download=True, transform=transform)
test_data  = CIFAR10(root="./data", train=False, download=True, transform=transform)

loader = torch.utils.data.DataLoader(train_data, batch_size=128, shuffle=True, num_workers=4)

Standard Benchmarks

CIFAR-10 and CIFAR-100

Small 32×32 colour images from the CIFAR collection.
from torchvision.datasets import CIFAR10, CIFAR100

CIFAR10(
    root: str | Path,
    train: bool = True,          # True → 50 000 train images, False → 10 000 test images
    transform=None,
    target_transform=None,
    download: bool = False,
)

CIFAR100(               # identical signature; 100 fine-grained classes instead of 10
    root, train, transform, target_transform, download
)
DatasetClassesTrainTestImage size
CIFAR-101050 00010 00032×32 RGB
CIFAR-10010050 00010 00032×32 RGB
__getitem__ returns (PIL.Image, int).

MNIST family

Handwritten digit and character datasets, all sharing the same constructor signature.
MNIST(
    root: str | Path,
    train: bool = True,
    transform=None,
    target_transform=None,
    download: bool = False,
)
ClassDescriptionClassesTrainTest
MNISTHandwritten digits 0–91060 00010 000
FashionMNISTZalando clothing articles1060 00010 000
KMNISTJapanese Kuzushiji characters1060 00010 000
EMNISTExtended MNIST (letters + digits)variesvariesvaries
EMNIST uses a split argument instead of train:
from torchvision.datasets import EMNIST

# split: "byclass" | "bymerge" | "balanced" | "letters" | "digits" | "mnist"
dataset = EMNIST(root="./data", split="balanced", download=True)

ImageNet

The ILSVRC-2012 large-scale classification benchmark.
from torchvision.datasets import ImageNet

ImageNet(
    root: str | Path,              # must contain ILSVRC2012_devkit_t12.tar.gz
    split: str = "train",          # "train" or "val"
    **kwargs,                      # forwarded to ImageFolder: transform, target_transform, loader, etc.
)
ImageNet requires manual download from image-net.org. Place ILSVRC2012_img_train.tar, ILSVRC2012_img_val.tar, and ILSVRC2012_devkit_t12.tar.gz in your root directory before constructing the dataset. There is no download=True option.
Extra attributes provided by ImageNet:
AttributeDescription
wnidsList of WordNet IDs (synset strings)
wnid_to_idxMaps WordNet ID → class index
classesList of human-readable class-name tuples

ImageFolder / DatasetFolder

Use these when your images are already laid out in class subdirectories but don’t belong to a named benchmark. See the Datasets Overview page for full details.
from torchvision.datasets import ImageFolder

dataset = ImageFolder(root="/path/to/images", transform=transform)
# dataset.classes      → ['cat', 'dog', ...]
# dataset.class_to_idx → {'cat': 0, 'dog': 1, ...}

STL10

96×96 colour images designed for semi-supervised learning, with an additional large unlabelled pool.
from torchvision.datasets import STL10

STL10(
    root: str | Path,
    split: str = "train",   # "train" | "test" | "unlabeled" | "train+unlabeled"
    folds: int | None = None,
    transform=None,
    target_transform=None,
    download: bool = False,
)
__getitem__ returns (PIL.Image, int). Label is -1 for unlabelled samples.

SVHN

Street View House Numbers — digit recognition in natural scene images.
from torchvision.datasets import SVHN

SVHN(
    root: str | Path,
    split: str = "train",   # "train" | "test" | "extra"
    transform=None,
    target_transform=None,
    download: bool = False,
)
Requires scipy to load .mat files. Labels are remapped from the raw format so that digit 0 has index 0 (the dataset originally encodes it as 10).

Imagenette

A 10-class subset of ImageNet selected by fast.ai for rapid prototyping.
from torchvision.datasets import Imagenette

Imagenette(
    root: str | Path,
    split: str = "train",    # "train" | "val"
    size: str = "full",      # "full" | "320px" | "160px"
    download: bool = False,
    transform=None,
    target_transform=None,
)

Fine-Grained Recognition

Flowers102

102 flower categories photographed in the United Kingdom.
from torchvision.datasets import Flowers102

Flowers102(
    root: str | Path,
    split: str = "train",    # "train" | "val" | "test"
    transform=None,
    target_transform=None,
    download: bool = False,
)
Requires scipy to parse the .mat split files.

Food101

101 food categories, each with 750 training and 250 test images.
from torchvision.datasets import Food101

Food101(
    root: str | Path,
    split: str = "train",    # "train" | "test"
    transform=None,
    target_transform=None,
    download: bool = False,
)

GTSRB

German Traffic Sign Recognition Benchmark — 43 sign categories.
from torchvision.datasets import GTSRB

GTSRB(
    root: str | Path,
    split: str = "train",    # "train" | "test"
    transform=None,
    target_transform=None,
    download: bool = False,
)

DTD

Describable Textures Dataset — 47 texture categories with 10 predefined partitions.
from torchvision.datasets import DTD

DTD(
    root: str | Path,
    split: str = "train",      # "train" | "val" | "test"
    partition: int = 1,        # 1–10
    transform=None,
    target_transform=None,
    download: bool = False,
)

FGVCAircraft

Fine-grained recognition of aircraft variants.
from torchvision.datasets import FGVCAircraft

FGVCAircraft(
    root: str | Path,
    split: str = "trainval",            # "train" | "val" | "trainval" | "test"
    annotation_level: str = "variant",  # "variant" | "family" | "manufacturer"
    transform=None,
    target_transform=None,
    download: bool = False,
)

OxfordIIITPet

37 categories of cat and dog breeds; supports both classification and segmentation targets.
from torchvision.datasets import OxfordIIITPet

OxfordIIITPet(
    root: str | Path,
    split: str = "trainval",              # "trainval" | "test"
    target_types: str | list = "category",# "category" | "binary-category" | "segmentation"
    transforms=None,
    transform=None,
    target_transform=None,
    download: bool = False,
)

Caltech101 and Caltech256

Classic multi-category object recognition datasets.
from torchvision.datasets import Caltech101, Caltech256

Caltech101(
    root: str | Path,
    target_type: str = "category",  # "category" | "annotation"
    transform=None,
    target_transform=None,
    download: bool = False,
)

Caltech256(
    root: str | Path,
    transform=None,
    target_transform=None,
    download: bool = False,
)
Caltech101 and Caltech256 require gdown for automatic download. Install it with pip install gdown before passing download=True.

PCAM

PatchCamelyon — 327 680 histopathology patches for binary cancer classification.
from torchvision.datasets import PCAM

PCAM(
    root: str | Path,
    split: str = "train",    # "train" | "val" | "test"
    transform=None,
    target_transform=None,
    download: bool = False,
)

EuroSAT

Satellite imagery in 10 land-use / land-cover classes.
from torchvision.datasets import EuroSAT

EuroSAT(
    root: str | Path,
    transform=None,
    target_transform=None,
    download: bool = False,
)
Inherits from ImageFolder; the dataset has no predefined split — use torch.utils.data.random_split to create train/val/test subsets.

Scene and Places

Places365

Large-scale scene recognition benchmark with 365 scene categories.
from torchvision.datasets import Places365

Places365(
    root: str | Path,
    split: str = "train-standard",  # "train-standard" | "train-challenge" | "val"
    small: bool = False,             # True → 256×256 resized images
    download: bool = False,
    transform=None,
    target_transform=None,
)

SUN397

Scene Understanding Benchmark covering 397 scene types.
from torchvision.datasets import SUN397

SUN397(
    root: str | Path,
    transform=None,
    target_transform=None,
    download: bool = False,
)

Country211

211-class geolocation dataset released by OpenAI.
from torchvision.datasets import Country211

Country211(
    root: str | Path,
    split: str = "train",    # "train" | "valid" | "test"
    transform=None,
    target_transform=None,
    download: bool = False,
)

Large-Scale / Specialized

INaturalist

Biodiversity observations across plants, animals, and fungi with hierarchical taxonomic labels.
from torchvision.datasets import INaturalist

INaturalist(
    root: str | Path,
    version: str = "2021_train",  # "2017" | "2018" | "2019" | "2021_train" | "2021_train_mini" | "2021_valid"
    target_type: str | list = "full",
    transform=None,
    target_transform=None,
    download: bool = False,
)

RenderedSST2

Sentiment classification rendered as images (positive / negative sentences).
from torchvision.datasets import RenderedSST2

RenderedSST2(
    root: str | Path,
    split: str = "train",    # "train" | "val" | "test"
    transform=None,
    target_transform=None,
    download: bool = False,
)

CLEVRClassification

Object-counting classification task from the CLEVR synthetic dataset.
from torchvision.datasets import CLEVRClassification

CLEVRClassification(
    root: str | Path,
    split: str = "train",    # "train" | "val" | "test"
    transform=None,
    target_transform=None,
    download: bool = False,
)

Dataset Summary Table

ClassSplitsClassesdownload=True
CIFAR10train / test (via train bool)10
CIFAR100train / test (via train bool)100
MNISTtrain / test (via train bool)10
FashionMNISTtrain / test (via train bool)10
KMNISTtrain / test (via train bool)10
EMNISTbyclass, bymerge, balanced, letters, digits, mnistvaries
ImageNettrain / val1 000❌ Manual
ImageFolderuser-defineduser-definedN/A
STL10train / test / unlabeled / train+unlabeled10 (+unlabelled)
SVHNtrain / test / extra10
Imagenettetrain / val10
Flowers102train / val / test102
Food101train / test101
GTSRBtrain / test43
DTDtrain / val / test47
FGVCAircrafttrain / val / trainval / test100 (variant)
OxfordIIITPettrainval / test37
Caltech101(no official split)101
Caltech256(no official split)257
PCAMtrain / val / test2❌ Manual (Google Drive)
EuroSAT(no official split)10
Country211train / valid / test211
INaturalistmultiple year versions10 000+
Places365train-standard / train-challenge / val365
SUN397(no official split)397
RenderedSST2train / val / test2
CLEVRClassificationtrain / val / test8 (object count)

Build docs developers (and LLMs) love