Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pytorch/vision/llms.txt

Use this file to discover all available pages before exploring further.

TorchVision is PyTorch’s official computer vision library. It ships over 60 pre-trained model architectures, 50+ benchmark datasets, a composable transforms pipeline, and low-level vision operators — everything you need to build, train, and deploy computer vision models.

Installation

Install TorchVision alongside PyTorch for your platform and CUDA version.

Quickstart

Run inference with a pre-trained model in under 10 lines of code.

Models

Browse 60+ architectures for classification, detection, segmentation, and more.

Transforms

Build augmentation pipelines for images, bounding boxes, masks, and keypoints.

Datasets

Load 50+ standard benchmarks with a single line of code.

Ops & Utilities

NMS, RoI Align, Feature Pyramid Networks, and visualization helpers.

What’s in TorchVision

TorchVision is organized into several focused modules:

torchvision.models

Pre-trained ResNet, EfficientNet, ViT, Faster R-CNN, DeepLabV3, RAFT, and more. Load weights and their matching preprocessing transforms in one call.

torchvision.transforms v2

Type-safe transforms that work on images, segmentation masks, bounding boxes, keypoints, and videos — all in the same pipeline.

torchvision.datasets

Ready-to-use wrappers for CIFAR, ImageNet, COCO, VOC, Cityscapes, Kinetics, and dozens more. Compatible with torch.utils.data.DataLoader.

torchvision.ops

Hardware-accelerated vision primitives: NMS, RoI Align, Deformable Conv, FPN, focal loss, and stochastic depth.

torchvision.io

Fast image and video I/O supporting JPEG, PNG, AVIF, HEIC, WebP, GIF, and video formats via PyAV.

torchvision.utils

Visualize predictions: draw bounding boxes, segmentation masks, keypoints, and optical flow on images.

Get started in 3 steps

1

Install TorchVision

Install with pip alongside a matching version of PyTorch:
pip install torch torchvision
2

Load a pre-trained model

Use the Weights API to get a model and its recommended preprocessing transforms:
from torchvision.models import resnet50, ResNet50_Weights

weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights)
model.eval()
preprocess = weights.transforms()
3

Run inference

Apply the transforms and pass your image through the model:
from torchvision.io import read_image

img = read_image("image.jpg")
batch = preprocess(img).unsqueeze(0)
prediction = model(batch).squeeze(0).softmax(0)
class_id = prediction.argmax().item()
label = weights.meta["categories"][class_id]
print(f"Predicted: {label} ({prediction[class_id]:.1%})")
TorchVision versions are tied to specific PyTorch releases. Check the installation guide for the compatibility matrix before upgrading.

Build docs developers (and LLMs) love