Image Classification Datasets in TorchVision

TorchVision provides more than 25 image-classification datasets covering digit recognition, natural scenes, fine-grained categories, satellite imagery, and large-scale benchmarks. Every dataset returns (image, label) tuples and integrates seamlessly with torch.utils.data.DataLoader. Most datasets support download=True for automatic setup; a few — most notably ImageNet — require manual download due to access restrictions.

download=True works for the majority of datasets. ImageNet, PCAM (manual Google Drive download), and LFW (no longer auto-downloadable) require you to obtain the files yourself and place them in the root directory before constructing the dataset.

Quick Start

from torchvision.datasets import CIFAR10
import torchvision.transforms.v2 as T
import torch

transform = T.Compose([
    T.ToImage(),
    T.ToDtype(torch.float32, scale=True),
    T.Normalize(mean=[0.4914, 0.4822, 0.4465],
                std=[0.2023, 0.1994, 0.2010]),
])

train_data = CIFAR10(root="./data", train=True,  download=True, transform=transform)
test_data  = CIFAR10(root="./data", train=False, download=True, transform=transform)

loader = torch.utils.data.DataLoader(train_data, batch_size=128, shuffle=True, num_workers=4)

Standard Benchmarks

CIFAR-10 and CIFAR-100

Small 32×32 colour images from the CIFAR collection.

from torchvision.datasets import CIFAR10, CIFAR100

CIFAR10(
    root: str | Path,
    train: bool = True,          # True → 50 000 train images, False → 10 000 test images
    transform=None,
    target_transform=None,
    download: bool = False,
)

CIFAR100(               # identical signature; 100 fine-grained classes instead of 10
    root, train, transform, target_transform, download
)

Dataset	Classes	Train	Test	Image size
CIFAR-10	10	50 000	10 000	32×32 RGB
CIFAR-100	100	50 000	10 000	32×32 RGB

__getitem__ returns (PIL.Image, int).

MNIST family

Handwritten digit and character datasets, all sharing the same constructor signature.

MNIST(
    root: str | Path,
    train: bool = True,
    transform=None,
    target_transform=None,
    download: bool = False,
)

Class	Description	Classes	Train	Test
`MNIST`	Handwritten digits 0–9	10	60 000	10 000
`FashionMNIST`	Zalando clothing articles	10	60 000	10 000
`KMNIST`	Japanese Kuzushiji characters	10	60 000	10 000
`EMNIST`	Extended MNIST (letters + digits)	varies	varies	varies

EMNIST uses a split argument instead of train:

from torchvision.datasets import EMNIST

# split: "byclass" | "bymerge" | "balanced" | "letters" | "digits" | "mnist"
dataset = EMNIST(root="./data", split="balanced", download=True)

ImageNet

The ILSVRC-2012 large-scale classification benchmark.

from torchvision.datasets import ImageNet

ImageNet(
    root: str | Path,              # must contain ILSVRC2012_devkit_t12.tar.gz
    split: str = "train",          # "train" or "val"
    **kwargs,                      # forwarded to ImageFolder: transform, target_transform, loader, etc.
)

ImageNet requires manual download from image-net.org. Place ILSVRC2012_img_train.tar, ILSVRC2012_img_val.tar, and ILSVRC2012_devkit_t12.tar.gz in your root directory before constructing the dataset. There is no download=True option.

Extra attributes provided by ImageNet:

Attribute	Description
`wnids`	List of WordNet IDs (synset strings)
`wnid_to_idx`	Maps WordNet ID → class index
`classes`	List of human-readable class-name tuples

ImageFolder / DatasetFolder

Use these when your images are already laid out in class subdirectories but don’t belong to a named benchmark. See the Datasets Overview page for full details.

from torchvision.datasets import ImageFolder

dataset = ImageFolder(root="/path/to/images", transform=transform)
# dataset.classes      → ['cat', 'dog', ...]
# dataset.class_to_idx → {'cat': 0, 'dog': 1, ...}

STL10

96×96 colour images designed for semi-supervised learning, with an additional large unlabelled pool.

from torchvision.datasets import STL10

STL10(
    root: str | Path,
    split: str = "train",   # "train" | "test" | "unlabeled" | "train+unlabeled"
    folds: int | None = None,
    transform=None,
    target_transform=None,
    download: bool = False,
)

__getitem__ returns (PIL.Image, int). Label is -1 for unlabelled samples.

SVHN

Street View House Numbers — digit recognition in natural scene images.

from torchvision.datasets import SVHN

SVHN(
    root: str | Path,
    split: str = "train",   # "train" | "test" | "extra"
    transform=None,
    target_transform=None,
    download: bool = False,
)

Requires scipy to load .mat files. Labels are remapped from the raw format so that digit 0 has index 0 (the dataset originally encodes it as 10).

Imagenette

A 10-class subset of ImageNet selected by fast.ai for rapid prototyping.

from torchvision.datasets import Imagenette

Imagenette(
    root: str | Path,
    split: str = "train",    # "train" | "val"
    size: str = "full",      # "full" | "320px" | "160px"
    download: bool = False,
    transform=None,
    target_transform=None,
)

Fine-Grained Recognition

Flowers102

102 flower categories photographed in the United Kingdom.

from torchvision.datasets import Flowers102

Flowers102(
    root: str | Path,
    split: str = "train",    # "train" | "val" | "test"
    transform=None,
    target_transform=None,
    download: bool = False,
)

Requires scipy to parse the .mat split files.

Food101

101 food categories, each with 750 training and 250 test images.

from torchvision.datasets import Food101

Food101(
    root: str | Path,
    split: str = "train",    # "train" | "test"
    transform=None,
    target_transform=None,
    download: bool = False,
)

GTSRB

German Traffic Sign Recognition Benchmark — 43 sign categories.

from torchvision.datasets import GTSRB

GTSRB(
    root: str | Path,
    split: str = "train",    # "train" | "test"
    transform=None,
    target_transform=None,
    download: bool = False,
)

DTD

Describable Textures Dataset — 47 texture categories with 10 predefined partitions.

from torchvision.datasets import DTD

DTD(
    root: str | Path,
    split: str = "train",      # "train" | "val" | "test"
    partition: int = 1,        # 1–10
    transform=None,
    target_transform=None,
    download: bool = False,
)

FGVCAircraft

Fine-grained recognition of aircraft variants.

from torchvision.datasets import FGVCAircraft

FGVCAircraft(
    root: str | Path,
    split: str = "trainval",            # "train" | "val" | "trainval" | "test"
    annotation_level: str = "variant",  # "variant" | "family" | "manufacturer"
    transform=None,
    target_transform=None,
    download: bool = False,
)

OxfordIIITPet

37 categories of cat and dog breeds; supports both classification and segmentation targets.

from torchvision.datasets import OxfordIIITPet

OxfordIIITPet(
    root: str | Path,
    split: str = "trainval",              # "trainval" | "test"
    target_types: str | list = "category",# "category" | "binary-category" | "segmentation"
    transforms=None,
    transform=None,
    target_transform=None,
    download: bool = False,
)

Caltech101 and Caltech256

Classic multi-category object recognition datasets.

from torchvision.datasets import Caltech101, Caltech256

Caltech101(
    root: str | Path,
    target_type: str = "category",  # "category" | "annotation"
    transform=None,
    target_transform=None,
    download: bool = False,
)

Caltech256(
    root: str | Path,
    transform=None,
    target_transform=None,
    download: bool = False,
)

Caltech101 and Caltech256 require gdown for automatic download. Install it with pip install gdown before passing download=True.

PCAM

PatchCamelyon — 327 680 histopathology patches for binary cancer classification.

from torchvision.datasets import PCAM

PCAM(
    root: str | Path,
    split: str = "train",    # "train" | "val" | "test"
    transform=None,
    target_transform=None,
    download: bool = False,
)

EuroSAT

Satellite imagery in 10 land-use / land-cover classes.

from torchvision.datasets import EuroSAT

EuroSAT(
    root: str | Path,
    transform=None,
    target_transform=None,
    download: bool = False,
)

Inherits from ImageFolder; the dataset has no predefined split — use torch.utils.data.random_split to create train/val/test subsets.

Scene and Places

Places365

Large-scale scene recognition benchmark with 365 scene categories.

from torchvision.datasets import Places365

Places365(
    root: str | Path,
    split: str = "train-standard",  # "train-standard" | "train-challenge" | "val"
    small: bool = False,             # True → 256×256 resized images
    download: bool = False,
    transform=None,
    target_transform=None,
)

SUN397

Scene Understanding Benchmark covering 397 scene types.

from torchvision.datasets import SUN397

SUN397(
    root: str | Path,
    transform=None,
    target_transform=None,
    download: bool = False,
)

Country211

211-class geolocation dataset released by OpenAI.

from torchvision.datasets import Country211

Country211(
    root: str | Path,
    split: str = "train",    # "train" | "valid" | "test"
    transform=None,
    target_transform=None,
    download: bool = False,
)

Large-Scale / Specialized

INaturalist

Biodiversity observations across plants, animals, and fungi with hierarchical taxonomic labels.

from torchvision.datasets import INaturalist

INaturalist(
    root: str | Path,
    version: str = "2021_train",  # "2017" | "2018" | "2019" | "2021_train" | "2021_train_mini" | "2021_valid"
    target_type: str | list = "full",
    transform=None,
    target_transform=None,
    download: bool = False,
)

RenderedSST2

Sentiment classification rendered as images (positive / negative sentences).

from torchvision.datasets import RenderedSST2

RenderedSST2(
    root: str | Path,
    split: str = "train",    # "train" | "val" | "test"
    transform=None,
    target_transform=None,
    download: bool = False,
)

CLEVRClassification

Object-counting classification task from the CLEVR synthetic dataset.

from torchvision.datasets import CLEVRClassification

CLEVRClassification(
    root: str | Path,
    split: str = "train",    # "train" | "val" | "test"
    transform=None,
    target_transform=None,
    download: bool = False,
)

Dataset Summary Table

Class	Splits	Classes	`download=True`
`CIFAR10`	`train` / `test` (via `train` bool)	10	✅
`CIFAR100`	`train` / `test` (via `train` bool)	100	✅
`MNIST`	`train` / `test` (via `train` bool)	10	✅
`FashionMNIST`	`train` / `test` (via `train` bool)	10	✅
`KMNIST`	`train` / `test` (via `train` bool)	10	✅
`EMNIST`	`byclass`, `bymerge`, `balanced`, `letters`, `digits`, `mnist`	varies	✅
`ImageNet`	`train` / `val`	1 000	❌ Manual
`ImageFolder`	user-defined	user-defined	N/A
`STL10`	`train` / `test` / `unlabeled` / `train+unlabeled`	10 (+unlabelled)	✅
`SVHN`	`train` / `test` / `extra`	10	✅
`Imagenette`	`train` / `val`	10	✅
`Flowers102`	`train` / `val` / `test`	102	✅
`Food101`	`train` / `test`	101	✅
`GTSRB`	`train` / `test`	43	✅
`DTD`	`train` / `val` / `test`	47	✅
`FGVCAircraft`	`train` / `val` / `trainval` / `test`	100 (variant)	✅
`OxfordIIITPet`	`trainval` / `test`	37	✅
`Caltech101`	(no official split)	101	✅
`Caltech256`	(no official split)	257	✅
`PCAM`	`train` / `val` / `test`	2	❌ Manual (Google Drive)
`EuroSAT`	(no official split)	10	✅
`Country211`	`train` / `valid` / `test`	211	✅
`INaturalist`	multiple year versions	10 000+	✅
`Places365`	`train-standard` / `train-challenge` / `val`	365	✅
`SUN397`	(no official split)	397	✅
`RenderedSST2`	`train` / `val` / `test`	2	✅
`CLEVRClassification`	`train` / `val` / `test`	8 (object count)	✅

Get Started

Transforms

Datasets

I/O & Utilities

Image Classification Datasets in TorchVision

Quick Start

Standard Benchmarks

CIFAR-10 and CIFAR-100

MNIST family

ImageNet

ImageFolder / DatasetFolder

STL10

SVHN

Imagenette

Fine-Grained Recognition

Flowers102

Food101

GTSRB

DTD

FGVCAircraft

OxfordIIITPet

Caltech101 and Caltech256

PCAM

EuroSAT

Scene and Places

Places365

SUN397

Country211

Large-Scale / Specialized

INaturalist

RenderedSST2

CLEVRClassification

Dataset Summary Table

Build docs developers (and LLMs) love

Get Started

Transforms

Datasets

I/O & Utilities

Documentation Index

​Quick Start

​Standard Benchmarks

​CIFAR-10 and CIFAR-100

​MNIST family

​ImageNet

​ImageFolder / DatasetFolder

​STL10

​SVHN

​Imagenette

​Fine-Grained Recognition

​Flowers102

​Food101

​GTSRB

​DTD

​FGVCAircraft

​OxfordIIITPet

​Caltech101 and Caltech256

​PCAM

​EuroSAT

​Scene and Places

​Places365

​SUN397

​Country211

​Large-Scale / Specialized

​INaturalist

​RenderedSST2

​CLEVRClassification

​Dataset Summary Table

Build docs developers (and LLMs) love

Quick Start

Standard Benchmarks

CIFAR-10 and CIFAR-100

MNIST family

ImageNet

ImageFolder / DatasetFolder

STL10

SVHN

Imagenette

Fine-Grained Recognition

Flowers102

Food101

GTSRB

DTD

FGVCAircraft

OxfordIIITPet

Caltech101 and Caltech256

PCAM

EuroSAT

Scene and Places

Places365

SUN397

Country211

Large-Scale / Specialized

INaturalist

RenderedSST2

CLEVRClassification

Dataset Summary Table