Optical flow estimation predicts the per-pixel 2D displacement field between two consecutive video frames, representing where each pixel moved. TorchVision provides RAFT (Recurrent All-Pairs Field Transforms, Teed & Deng 2020) — a state-of-the-art recurrent architecture that iteratively refines flow predictions using a 4D all-pairs cost volume. Two model variants are available: a full-capacityDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/pytorch/vision/llms.txt
Use this file to discover all available pages before exploring further.
raft_large and a lightweight raft_small.
All optical flow models live in
torchvision.models.optical_flow. RAFT outputs a list of flow tensors — one per recurrent iteration — with the final prediction at list_of_flows[-1].Models
RAFT Large
Full RAFT architecture. 5.26 M parameters, 211 GFLOPs. Multiple weight checkpoints covering different fine-tuning stages (FlyingChairs + FlyingThings3D, Sintel, KITTI).
RAFT Small
Compact RAFT variant. 0.99 M parameters — roughly 5× smaller. Suitable for latency-sensitive applications. Weights available for FlyingChairs + FlyingThings3D training.
Pretrained Weights
RAFT follows a multi-stage training curriculum. The weight enum names encode the datasets used:- C = FlyingChairs
- T = FlyingThings3D
- S = Sintel
- K = KITTI
- H = HD1K
Raft_Large_Weights
| Enum key | Trained on | Sintel-Train Clean EPE | Sintel-Train Final EPE | Notes |
|---|---|---|---|---|
C_T_V1 | FlyingChairs + FlyingThings3D (ported) | 1.44 | 2.79 | Ported from original paper |
C_T_V2 | FlyingChairs + FlyingThings3D | 1.38 | 2.72 | Trained from scratch |
C_T_SKHT_V1 | +Sintel fine-tune (ported) | — (Sintel-Test Clean: 1.94) | — (Sintel-Test Final: 3.18) | Ported |
C_T_SKHT_V2 (DEFAULT) | +Sintel fine-tune | — (Sintel-Test Clean: 1.82) | — (Sintel-Test Final: 3.07) | Trained from scratch |
C_T_SKHT_K_V1 | +Sintel +KITTI (ported) | — | — | KITTI-Test fl-all: 5.10% |
C_T_SKHT_K_V2 | +Sintel +KITTI | — | — | KITTI-Test fl-all: 5.19% |
Raft_Small_Weights
| Enum key | Trained on | Sintel-Train Clean EPE | Sintel-Train Final EPE | Notes |
|---|---|---|---|---|
C_T_V1 | FlyingChairs + FlyingThings3D (ported) | 2.12 | 3.28 | Ported from original paper |
C_T_V2 (DEFAULT) | FlyingChairs + FlyingThings3D | 1.99 | 3.28 | ~5× fewer params than Large |
Quick Start
Output Format
RAFT returns a list ofTensor[B, 2, H, W] tensors — one per recurrent refinement step. The two channels represent:
| Channel | Meaning |
|---|---|
flow[:, 0, :, :] | Horizontal displacement u (in pixels) |
flow[:, 1, :, :] | Vertical displacement v (in pixels) |
raft_large, 12 for raft_small). Use list_of_flows[-1] for the highest-quality prediction.
Visualizing Flow
TorchVision provides a built-in utility to convert a flow tensor to an RGB image using the HSV color wheel convention (hue = direction, saturation = magnitude):Using RAFT Small
raft_small has the same API as raft_large but with a reduced encoder and correlation volume:
Complete Example with Visualization
Selecting the Right Weights
Sintel evaluation
Use
Raft_Large_Weights.C_T_SKHT_V2 (DEFAULT). Fine-tuned on Sintel + KITTI + HD1K for best generalization.KITTI evaluation
Use
Raft_Large_Weights.C_T_SKHT_K_V2 for the lowest KITTI fl-all metric (5.19%).Fast prototyping
Use
raft_small with Raft_Small_Weights.DEFAULT — nearly 5× fewer parameters, significantly faster.Reproducibility
Use
C_T_V1 or C_T_SKHT_V1 variants (ported from the original Princeton RAFT repo) to match paper numbers exactly.