Skip to main content
The Ultralytics plugin provides real-time pose detection using YOLO models with detailed hand and wrist tracking.

Installation

uv add vision-agents-plugins-ultralytics
Alternatively:
pip install vision-agents-plugins-ultralytics

Components

YOLOPoseProcessor

Real-time pose detection with hand tracking:
from vision_agents.plugins import ultralytics
from vision_agents.core import Agent

processor = ultralytics.YOLOPoseProcessor(
    model_path="yolo11n-pose.pt",
    conf_threshold=0.5,
    device="cpu",
    enable_hand_tracking=True,
    enable_wrist_highlights=True
)

agent = Agent(
    processors=[processor],
    llm=your_llm,
    # ... other config
)
model_path
string
default:"yolo11n-pose.pt"
Path to YOLO pose model file. Model will be downloaded automatically on first use if not present.
conf_threshold
float
default:"0.5"
Confidence threshold for pose detection (0.0 - 1.0)
device
string
default:"cpu"
Device to run inference on: cpu or cuda
imgsz
int
default:"512"
Image size for YOLO inference. Larger values improve accuracy but reduce speed.
max_workers
int
default:"2"
Number of worker threads for processing
interval
float
default:"0"
Processing interval in seconds. 0 processes every frame.
enable_hand_tracking
bool
default:"False"
Whether to draw detailed hand connections and keypoints
enable_wrist_highlights
bool
default:"False"
Whether to highlight wrist positions with circles

Usage Examples

Basic Pose Detection

from vision_agents.plugins import ultralytics
from vision_agents.core import Agent

processor = ultralytics.YOLOPoseProcessor(
    model_path="yolo11n-pose.pt",
    conf_threshold=0.5
)

agent = Agent(
    processors=[processor],
    # ... other config
)

Hand Tracking Enabled

processor = ultralytics.YOLOPoseProcessor(
    model_path="yolo11n-pose.pt",
    conf_threshold=0.5,
    enable_hand_tracking=True,
    enable_wrist_highlights=True
)

GPU Acceleration

processor = ultralytics.YOLOPoseProcessor(
    model_path="yolo11n-pose.pt",
    device="cuda",  # Use GPU
    conf_threshold=0.5,
    imgsz=640  # Larger image size for better accuracy
)

Process Every Other Frame

processor = ultralytics.YOLOPoseProcessor(
    model_path="yolo11n-pose.pt",
    interval=0.033,  # Process every ~30fps
    conf_threshold=0.5
)

Features

  • Real-time Pose Detection: Detect human poses in video streams
  • Hand Tracking: Detailed hand keypoint connections
  • Wrist Highlights: Highlight wrist positions for easy tracking
  • GPU Support: Run on CUDA-enabled GPUs for faster processing
  • Configurable: Adjust confidence, image size, and processing rate
  • Video Annotation: Automatically draws pose overlays on video

Model Options

Ultralytics supports various YOLO pose models:
ModelSizeSpeedAccuracy
yolo11n-pose.ptNanoFastestGood
yolo11s-pose.ptSmallFastBetter
yolo11m-pose.ptMediumMediumHigh
yolo11l-pose.ptLargeSlowHigher
yolo11x-pose.ptExtra LargeSlowestHighest
Models are downloaded automatically from Ultralytics on first use.

Keypoints Detected

The YOLO pose model detects 17 keypoints:
  1. Nose
  2. Left Eye
  3. Right Eye
  4. Left Ear
  5. Right Ear
  6. Left Shoulder
  7. Right Shoulder
  8. Left Elbow
  9. Right Elbow
  10. Left Wrist
  11. Right Wrist
  12. Left Hip
  13. Right Hip
  14. Left Knee
  15. Right Knee
  16. Left Ankle
  17. Right Ankle

Performance Tuning

For Speed

processor = ultralytics.YOLOPoseProcessor(
    model_path="yolo11n-pose.pt",  # Nano model
    device="cuda",  # GPU if available
    imgsz=320,  # Smaller image size
    interval=0.1,  # Process every 100ms
    conf_threshold=0.6  # Higher threshold = fewer detections
)

For Accuracy

processor = ultralytics.YOLOPoseProcessor(
    model_path="yolo11l-pose.pt",  # Large model
    device="cuda",  # GPU required for large models
    imgsz=640,  # Larger image size
    interval=0,  # Process every frame
    conf_threshold=0.3  # Lower threshold = more detections
)

Configuration

Device Selection

# CPU (works everywhere)
processor = ultralytics.YOLOPoseProcessor(device="cpu")

# CUDA GPU (fastest)
processor = ultralytics.YOLOPoseProcessor(device="cuda")

# Auto-detect (use GPU if available)
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = ultralytics.YOLOPoseProcessor(device=device)

Dependencies

Required packages (automatically installed):
  • ultralytics>=8.0.0 - YOLO models
  • opencv-python>=4.8.0 - Image processing
  • numpy>=1.24.0 - Array operations
  • pillow>=10.0.0 - Image handling
  • aiortc>=1.6.0 - WebRTC support
  • av>=10.0.0 - Audio/video processing

References

Build docs developers (and LLMs) love