Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pytorch/vision/llms.txt

Use this file to discover all available pages before exploring further.

The torchvision.utils module provides a collection of visualization helpers that operate directly on PyTorch tensors — no conversion to PIL or NumPy required. You can annotate images with detection boxes, segmentation overlays, pose keypoints, and optical flow, then save them individually or arranged into a grid. All drawing functions accept uint8 or floating-point tensors and return a tensor of the same dtype.

make_grid

torchvision.utils.make_grid(
    tensor: Tensor | list[Tensor],
    nrow: int = 8,
    padding: int = 2,
    normalize: bool = False,
    value_range: tuple[int, int] | None = None,
    scale_each: bool = False,
    pad_value: float = 0.0,
) -> Tensor
Arrange a batch of images into a single grid image. Input can be a 4-D mini-batch tensor (B, C, H, W) or a list of equal-size 3-D tensors (C, H, W). The grid has nrow images per row, for a total layout of ⌈B / nrow⌉ rows.
tensor
Tensor[B, C, H, W] | list[Tensor[C, H, W]]
required
A 4-D batch tensor or a list of 3-D image tensors. All images must have identical spatial dimensions when a list is provided. Single-channel images are automatically replicated to 3 channels.
nrow
int
default:"8"
Number of images per row. The final grid width is nrow * (W + padding) + padding.
padding
int
default:"2"
Pixel padding between images and around the border of the grid.
normalize
bool
default:"False"
If True, shift and scale all values to [0, 1] using the range set by value_range (or the tensor’s own min/max when value_range is None).
value_range
tuple[int, int] | None
default:"None"
(min, max) tuple used when normalize=True. Values are clamped to this range before rescaling. Ignored when normalize=False.
scale_each
bool
default:"False"
When True and normalize=True, normalise each image in the batch independently rather than using global min/max. Useful when images have very different dynamic ranges.
pad_value
float
default:"0.0"
Pixel value used to fill the padding region. 0.0 → black; 1.0 → white (for float tensors).
Returns Tensor[C, H_grid, W_grid] — a single image tensor containing the arranged grid.
from torchvision.utils import make_grid
import torch

batch = torch.randint(0, 256, (16, 3, 64, 64), dtype=torch.uint8)

# 4 columns, 4 rows, white padding
grid = make_grid(batch, nrow=4, padding=4, pad_value=1.0)
print(grid.shape)  # Tensor[3, H_grid, W_grid]

save_image

torchvision.utils.save_image(
    tensor: Tensor | list[Tensor],
    fp: str | pathlib.Path | BinaryIO,
    format: str | None = None,
    **kwargs,
) -> None
A thin convenience wrapper that calls make_grid() on the input, converts to a PIL image, and saves to disk. Extra **kwargs (e.g. nrow, padding, normalize) are forwarded directly to make_grid().
tensor
Tensor | list[Tensor]
required
Image or batch of images to save. Same accepted shapes as make_grid().
fp
str | pathlib.Path | BinaryIO
required
File path or open file-like object to save to. The format is inferred from the filename extension unless format is set.
format
str | None
default:"None"
Image format string understood by PIL, e.g. "PNG", "JPEG". Required when fp is a file object with no inferrable extension.
**kwargs
any
Additional keyword arguments passed to make_grid() (e.g. nrow, normalize, value_range).
from torchvision.utils import make_grid, save_image
import torch

batch = torch.rand(16, 3, 64, 64)  # float32 in [0, 1]

# Save a normalised 4-column grid directly
save_image(batch, "grid.png", nrow=4, normalize=True)

# Equivalent explicit approach
grid = make_grid(batch, nrow=4, normalize=True)
save_image(grid, "grid_explicit.png")

draw_bounding_boxes

torchvision.utils.draw_bounding_boxes(
    image: Tensor,
    boxes: Tensor,
    labels: list[str] | None = None,
    colors: list[str | tuple] | str | tuple | None = None,
    fill: bool | None = False,
    width: int = 1,
    font: str | None = None,
    font_size: int | None = None,
    label_colors: list[str | tuple] | str | tuple | None = None,
    label_background_colors: list[str | tuple] | str | tuple | None = None,
    fill_labels: bool = False,
) -> Tensor
Draw bounding boxes (and optional text labels) onto an RGB or grayscale image tensor. Accepts both axis-aligned boxes (N, 4) in [xmin, ymin, xmax, ymax] format and oriented (rotated) boxes (N, 8) as [x1, y1, x2, y2, x3, y3, x4, y4].
image
Tensor[C, H, W]
required
Input image as a uint8 or float tensor. C must be 1 (grayscale, auto-expanded to RGB) or 3 (RGB). Grayscale images are converted to RGB for drawing. Float images are expected in [0, 1].
boxes
Tensor[N, 4] | Tensor[N, 8]
required
Bounding boxes in absolute pixel coordinates. Shape (N, 4) for axis-aligned XYXY boxes; shape (N, 8) for rotated boxes defined by 4 corner points.
labels
list[str] | None
default:"None"
List of N label strings drawn at the top-left corner of each box. Omit for unlabelled boxes.
colors
list | str | tuple | None
default:"None"
Box outline colours. Accepts PIL colour names ("red"), hex strings ("#FF0000"), or RGB tuples ((255, 0, 0)). A single value applies to all boxes; a list assigns one colour per box. Random colours are generated when None.
fill
bool
default:"False"
Fill box interiors with a semi-transparent version of the border colour. Save as PNG to preserve alpha when using fill=True.
width
int
default:"1"
Border line width in pixels.
font
str | None
default:"None"
Path to a TrueType font file for label text. Falls back to PIL’s default bitmap font when None.
font_size
int | None
default:"None"
Font size in points. Ignored if font is None.
label_colors
list | str | tuple | None
default:"None"
Text colour for each label. Defaults to the same colour used for the box outline, or black when fill_labels=True.
label_background_colors
list | str | tuple | None
default:"None"
Fill colour for label background boxes. Defaults to the box border colour. Only relevant when fill_labels=True.
fill_labels
bool
default:"False"
Fill label text backgrounds with label_background_colors.
Returns Tensor[C, H, W] — annotated image with the same dtype as image.
import torch
from torchvision.utils import draw_bounding_boxes
from torchvision.io import decode_image

img = decode_image("image.jpg")  # uint8 Tensor[3, H, W]
boxes = torch.tensor(
    [[100, 50, 300, 250], [400, 100, 600, 350]],
    dtype=torch.float,
)
labels = ["cat", "dog"]
colors = ["red", "blue"]

result = draw_bounding_boxes(img, boxes, labels=labels, colors=colors, width=3)

# Convert to PIL for display or further saving
from torchvision.transforms.v2.functional import to_pil_image
pil_img = to_pil_image(result)
pil_img.save("annotated.jpg")
Use fill=True with write_png() rather than write_jpeg() when boxes are filled — JPEG compression will degrade the semi-transparent overlay.

draw_segmentation_masks

torchvision.utils.draw_segmentation_masks(
    image: Tensor,
    masks: Tensor,
    alpha: float = 0.8,
    colors: list[str | tuple] | str | tuple | None = None,
) -> Tensor
Blend coloured mask overlays onto an RGB image. Each boolean mask in masks is filled with a distinct colour, then composited with alpha transparency over the base image. Pixels covered by multiple masks are set to black to highlight overlaps.
image
Tensor[3, H, W]
required
RGB image as a uint8 or float tensor. Must be exactly 3 channels.
masks
Tensor[N, H, W] | Tensor[H, W]
required
Boolean (dtype=torch.bool) mask tensor. Shape (N, H, W) for N instance masks, or (H, W) for a single mask. Spatial dimensions must match image.
alpha
float
default:"0.8"
Mask opacity in [0, 1]. 0 → fully transparent (masks invisible); 1 → fully opaque (original image hidden under masks).
colors
list | str | tuple | None
default:"None"
Colour(s) for each mask. Accepts PIL colour names, hex strings, or RGB tuples. A single value is applied to all masks. Random colours are generated when None.
Returns Tensor[C, H, W] with the same dtype as image.
import torch
from torchvision.utils import draw_segmentation_masks
from torchvision.io import decode_image

img = decode_image("scene.jpg", mode="RGB")  # uint8 Tensor[3, H, W]
H, W = img.shape[1], img.shape[2]

# Two binary masks (e.g. from a segmentation model)
masks = torch.zeros(2, H, W, dtype=torch.bool)
masks[0, 50:200, 80:300] = True   # person 1
masks[1, 100:350, 250:500] = True  # person 2

result = draw_segmentation_masks(
    img, masks, alpha=0.6, colors=["cyan", "magenta"]
)
Pixels where two or more masks overlap are set to black (0) regardless of colour. This highlights ambiguous regions. The blending formula is output = image * (1 - alpha) + mask_image * alpha.

draw_keypoints

torchvision.utils.draw_keypoints(
    image: Tensor,
    keypoints: Tensor,
    connectivity: list[tuple[int, int]] | None = None,
    colors: str | tuple | None = None,
    radius: int = 2,
    width: int = 3,
    visibility: Tensor | None = None,
) -> Tensor
Draw keypoints (and optionally skeleton connections) onto an RGB image tensor. Supports multiple instances simultaneously and respects per-keypoint visibility flags.
image
Tensor[3, H, W]
required
RGB image as a uint8 or float tensor. Must have exactly 3 channels.
keypoints
Tensor[N, K, 2]
required
N instances, each with K keypoints in [x, y] pixel coordinates. Integer coordinates are expected.
connectivity
list[tuple[int, int]] | None
default:"None"
Skeleton definition as a list of (start_idx, end_idx) tuples referencing keypoint indices. A line is drawn between connected keypoints only when both are visible. Connections are evaluated per-instance.
colors
str | tuple | None
default:"None"
Colour for both keypoint circles and skeleton lines. Accepts a PIL colour string or an RGB tuple.
radius
int
default:"2"
Radius in pixels for the keypoint ellipses.
width
int
default:"3"
Line width in pixels for skeleton connections.
visibility
Tensor[N, K] | None
default:"None"
Boolean tensor indicating which keypoints are visible. True → draw point and eligible connections; False → skip. Defaults to all-visible when None.
Returns Tensor[C, H, W] with the same dtype as image.
import torch
from torchvision.utils import draw_keypoints
from torchvision.io import decode_image

img = decode_image("person.jpg", mode="RGB")

# COCO-style upper-body skeleton (5 keypoints per person)
# 0=nose, 1=l_shoulder, 2=r_shoulder, 3=l_elbow, 4=r_elbow
keypoints = torch.tensor([
    [[320, 80], [280, 160], [360, 160], [240, 240], [400, 240]],  # person 1
    [[150, 90], [120, 170], [180, 170], [100, 250], [200, 250]],  # person 2
], dtype=torch.float)

skeleton = [(0, 1), (0, 2), (1, 3), (2, 4)]  # nose→shoulders→elbows

result = draw_keypoints(
    img,
    keypoints,
    connectivity=skeleton,
    colors="lime",
    radius=4,
    width=2,
)

flow_to_image

torchvision.utils.flow_to_image(flow: Tensor) -> Tensor
Convert an optical flow field to a colour-coded RGB image for visualisation. Direction is encoded as hue (angle on a colour wheel) and magnitude as saturation, following the Middlebury evaluation colour scheme.
flow
Tensor[N, 2, H, W] | Tensor[2, H, W]
required
Optical flow field as a torch.float32 tensor. The two channels are the horizontal (u) and vertical (v) displacement components in pixels. Accepts an unbatched (2, H, W) tensor or a batched (N, 2, H, W) tensor.
Returns Tensor[N, 3, H, W] or Tensor[3, H, W]uint8 RGB visualisation matching the input shape convention.
import torch
from torchvision.utils import flow_to_image

# Synthetic flow field (e.g. from an optical flow model)
flow = torch.randn(1, 2, 256, 256)  # Tensor[N=1, 2, H, W] float32

rgb = flow_to_image(flow)  # Tensor[1, 3, 256, 256] uint8
print(rgb.shape, rgb.dtype)
# torch.Size([1, 3, 256, 256]) torch.uint8

# Unbatched usage
flow_single = torch.randn(2, 256, 256)
rgb_single = flow_to_image(flow_single)  # Tensor[3, 256, 256]
flow_to_image() normalises the flow by its global maximum magnitude before mapping to colours. If your flow values are already normalised to [-1, 1], the resulting colours will still be correct but the magnitude shading will reflect the relative distribution within the batch.

Full Detection Visualization Example

1

Load image and run detection

import torch
from torchvision.io import decode_image

img = decode_image("street.jpg", mode="RGB")  # uint8 Tensor[3, H, W]

# Simulated model outputs
boxes = torch.tensor([
    [100,  50, 300, 250],
    [400, 100, 600, 350],
    [ 20,  80, 180, 300],
], dtype=torch.float)

scores = torch.tensor([0.92, 0.87, 0.78])
labels = ["car", "person", "bicycle"]
2

Draw bounding boxes

from torchvision.utils import draw_bounding_boxes

annotated = draw_bounding_boxes(
    img,
    boxes,
    labels=[f"{l} {s:.2f}" for l, s in zip(labels, scores)],
    colors=["red", "blue", "green"],
    width=3,
    fill=False,
)
3

Overlay segmentation masks

from torchvision.utils import draw_segmentation_masks

H, W = img.shape[1], img.shape[2]
masks = torch.zeros(3, H, W, dtype=torch.bool)
masks[0, 50:250, 100:300] = True   # car mask
masks[1, 100:350, 400:600] = True  # person mask
masks[2, 80:300, 20:180] = True    # bicycle mask

annotated = draw_segmentation_masks(
    annotated, masks, alpha=0.4, colors=["red", "blue", "green"]
)
4

Save the result

from torchvision.transforms.v2.functional import to_pil_image

pil_img = to_pil_image(annotated)
pil_img.save("detections.png")

Grid Visualization

Create a compact preview of a dataset batch or augmented images:
from torchvision.utils import make_grid, save_image
import torch

# Simulate a batch of 16 RGB images at 64×64
batch = torch.rand(16, 3, 64, 64)

# Arrange into a 4×4 grid with normalisation
grid = make_grid(batch, nrow=4, padding=4, normalize=True, pad_value=1.0)
save_image(grid, "batch_preview.png")

# Comparison grid: original vs augmented (2 rows of 8)
originals  = torch.rand(8, 3, 64, 64)
augmented  = originals + 0.1 * torch.randn_like(originals)
comparison = make_grid(
    torch.cat([originals, augmented], dim=0),
    nrow=8, padding=2, normalize=True,
)
save_image(comparison, "augmentation_comparison.png")
Set scale_each=True in make_grid() when comparing images with very different value ranges (e.g. model activations at different layers) so each image is normalised independently rather than sharing a global scale.

API Quick Reference

make_grid

Tile a batch of images into a single grid tensor. Supports normalisation, padding colour, and per-image scaling.

save_image

One-liner to call make_grid() and save the result. Accepts all make_grid kwargs.

draw_bounding_boxes

Annotate images with axis-aligned or rotated boxes, optional labels, fill, custom fonts and colours.

draw_segmentation_masks

Blend coloured boolean masks onto an image with configurable transparency.

draw_keypoints

Render human-pose or landmark keypoints with optional skeleton connectivity and per-keypoint visibility.

flow_to_image

Map an optical flow tensor to a hue-encoded uint8 RGB image for inspection.

Build docs developers (and LLMs) love