TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/pytorch/vision/llms.txt
Use this file to discover all available pages before exploring further.
torchvision.utils module provides a collection of visualization helpers that operate directly on PyTorch tensors — no conversion to PIL or NumPy required. You can annotate images with detection boxes, segmentation overlays, pose keypoints, and optical flow, then save them individually or arranged into a grid. All drawing functions accept uint8 or floating-point tensors and return a tensor of the same dtype.
make_grid
(B, C, H, W) or a list of equal-size 3-D tensors (C, H, W). The grid has nrow images per row, for a total layout of ⌈B / nrow⌉ rows.
A 4-D batch tensor or a list of 3-D image tensors. All images must have identical spatial dimensions when a list is provided. Single-channel images are automatically replicated to 3 channels.
Number of images per row. The final grid width is
nrow * (W + padding) + padding.Pixel padding between images and around the border of the grid.
If
True, shift and scale all values to [0, 1] using the range set by value_range (or the tensor’s own min/max when value_range is None).(min, max) tuple used when normalize=True. Values are clamped to this range before rescaling. Ignored when normalize=False.When
True and normalize=True, normalise each image in the batch independently rather than using global min/max. Useful when images have very different dynamic ranges.Pixel value used to fill the padding region.
0.0 → black; 1.0 → white (for float tensors).Tensor[C, H_grid, W_grid] — a single image tensor containing the arranged grid.
save_image
make_grid() on the input, converts to a PIL image, and saves to disk. Extra **kwargs (e.g. nrow, padding, normalize) are forwarded directly to make_grid().
Image or batch of images to save. Same accepted shapes as
make_grid().File path or open file-like object to save to. The format is inferred from the filename extension unless
format is set.Image format string understood by PIL, e.g.
"PNG", "JPEG". Required when fp is a file object with no inferrable extension.Additional keyword arguments passed to
make_grid() (e.g. nrow, normalize, value_range).draw_bounding_boxes
(N, 4) in [xmin, ymin, xmax, ymax] format and oriented (rotated) boxes (N, 8) as [x1, y1, x2, y2, x3, y3, x4, y4].
Input image as a
uint8 or float tensor. C must be 1 (grayscale, auto-expanded to RGB) or 3 (RGB). Grayscale images are converted to RGB for drawing. Float images are expected in [0, 1].Bounding boxes in absolute pixel coordinates. Shape
(N, 4) for axis-aligned XYXY boxes; shape (N, 8) for rotated boxes defined by 4 corner points.List of
N label strings drawn at the top-left corner of each box. Omit for unlabelled boxes.Box outline colours. Accepts PIL colour names (
"red"), hex strings ("#FF0000"), or RGB tuples ((255, 0, 0)). A single value applies to all boxes; a list assigns one colour per box. Random colours are generated when None.Fill box interiors with a semi-transparent version of the border colour. Save as PNG to preserve alpha when using
fill=True.Border line width in pixels.
Path to a TrueType font file for label text. Falls back to PIL’s default bitmap font when
None.Font size in points. Ignored if
font is None.Text colour for each label. Defaults to the same colour used for the box outline, or black when
fill_labels=True.Fill colour for label background boxes. Defaults to the box border colour. Only relevant when
fill_labels=True.Fill label text backgrounds with
label_background_colors.Tensor[C, H, W] — annotated image with the same dtype as image.
draw_segmentation_masks
masks is filled with a distinct colour, then composited with alpha transparency over the base image. Pixels covered by multiple masks are set to black to highlight overlaps.
RGB image as a
uint8 or float tensor. Must be exactly 3 channels.Boolean (
dtype=torch.bool) mask tensor. Shape (N, H, W) for N instance masks, or (H, W) for a single mask. Spatial dimensions must match image.Mask opacity in
[0, 1]. 0 → fully transparent (masks invisible); 1 → fully opaque (original image hidden under masks).Colour(s) for each mask. Accepts PIL colour names, hex strings, or RGB tuples. A single value is applied to all masks. Random colours are generated when
None.Tensor[C, H, W] with the same dtype as image.
Pixels where two or more masks overlap are set to black (
0) regardless of colour. This highlights ambiguous regions. The blending formula is output = image * (1 - alpha) + mask_image * alpha.draw_keypoints
RGB image as a
uint8 or float tensor. Must have exactly 3 channels.N instances, each with K keypoints in [x, y] pixel coordinates. Integer coordinates are expected.Skeleton definition as a list of
(start_idx, end_idx) tuples referencing keypoint indices. A line is drawn between connected keypoints only when both are visible. Connections are evaluated per-instance.Colour for both keypoint circles and skeleton lines. Accepts a PIL colour string or an RGB tuple.
Radius in pixels for the keypoint ellipses.
Line width in pixels for skeleton connections.
Boolean tensor indicating which keypoints are visible.
True → draw point and eligible connections; False → skip. Defaults to all-visible when None.Tensor[C, H, W] with the same dtype as image.
flow_to_image
Optical flow field as a
torch.float32 tensor. The two channels are the horizontal (u) and vertical (v) displacement components in pixels. Accepts an unbatched (2, H, W) tensor or a batched (N, 2, H, W) tensor.Tensor[N, 3, H, W] or Tensor[3, H, W] — uint8 RGB visualisation matching the input shape convention.
flow_to_image() normalises the flow by its global maximum magnitude before mapping to colours. If your flow values are already normalised to [-1, 1], the resulting colours will still be correct but the magnitude shading will reflect the relative distribution within the batch.Full Detection Visualization Example
Grid Visualization
Create a compact preview of a dataset batch or augmented images:API Quick Reference
make_grid
Tile a batch of images into a single grid tensor. Supports normalisation, padding colour, and per-image scaling.
save_image
One-liner to call
make_grid() and save the result. Accepts all make_grid kwargs.draw_bounding_boxes
Annotate images with axis-aligned or rotated boxes, optional labels, fill, custom fonts and colours.
draw_segmentation_masks
Blend coloured boolean masks onto an image with configurable transparency.
draw_keypoints
Render human-pose or landmark keypoints with optional skeleton connectivity and per-keypoint visibility.
flow_to_image
Map an optical flow tensor to a hue-encoded
uint8 RGB image for inspection.