Ultralytics

The Ultralytics plugin provides real-time pose detection using YOLO models with detailed hand and wrist tracking.

Installation

uv add vision-agents-plugins-ultralytics

Alternatively:

pip install vision-agents-plugins-ultralytics

Components

YOLOPoseProcessor

Real-time pose detection with hand tracking:

from vision_agents.plugins import ultralytics
from vision_agents.core import Agent

processor = ultralytics.YOLOPoseProcessor(
    model_path="yolo11n-pose.pt",
    conf_threshold=0.5,
    device="cpu",
    enable_hand_tracking=True,
    enable_wrist_highlights=True
)

agent = Agent(
    processors=[processor],
    llm=your_llm,
    # ... other config
)

model_path

string

default:"yolo11n-pose.pt"

Path to YOLO pose model file. Model will be downloaded automatically on first use if not present.

conf_threshold

float

default:"0.5"

Confidence threshold for pose detection (0.0 - 1.0)

device

string

default:"cpu"

Device to run inference on: cpu or cuda

imgsz

int

default:"512"

Image size for YOLO inference. Larger values improve accuracy but reduce speed.

max_workers

int

default:"2"

Number of worker threads for processing

interval

float

default:"0"

Processing interval in seconds. 0 processes every frame.

enable_hand_tracking

bool

default:"False"

Whether to draw detailed hand connections and keypoints

enable_wrist_highlights

bool

default:"False"

Whether to highlight wrist positions with circles

Usage Examples

Basic Pose Detection

from vision_agents.plugins import ultralytics
from vision_agents.core import Agent

processor = ultralytics.YOLOPoseProcessor(
    model_path="yolo11n-pose.pt",
    conf_threshold=0.5
)

agent = Agent(
    processors=[processor],
    # ... other config
)

Hand Tracking Enabled

processor = ultralytics.YOLOPoseProcessor(
    model_path="yolo11n-pose.pt",
    conf_threshold=0.5,
    enable_hand_tracking=True,
    enable_wrist_highlights=True
)

GPU Acceleration

processor = ultralytics.YOLOPoseProcessor(
    model_path="yolo11n-pose.pt",
    device="cuda",  # Use GPU
    conf_threshold=0.5,
    imgsz=640  # Larger image size for better accuracy
)

Process Every Other Frame

processor = ultralytics.YOLOPoseProcessor(
    model_path="yolo11n-pose.pt",
    interval=0.033,  # Process every ~30fps
    conf_threshold=0.5
)

Features

Real-time Pose Detection: Detect human poses in video streams
Hand Tracking: Detailed hand keypoint connections
Wrist Highlights: Highlight wrist positions for easy tracking
GPU Support: Run on CUDA-enabled GPUs for faster processing
Configurable: Adjust confidence, image size, and processing rate
Video Annotation: Automatically draws pose overlays on video

Model Options

Ultralytics supports various YOLO pose models:

Model	Size	Speed	Accuracy
`yolo11n-pose.pt`	Nano	Fastest	Good
`yolo11s-pose.pt`	Small	Fast	Better
`yolo11m-pose.pt`	Medium	Medium	High
`yolo11l-pose.pt`	Large	Slow	Higher
`yolo11x-pose.pt`	Extra Large	Slowest	Highest

Models are downloaded automatically from Ultralytics on first use.

Keypoints Detected

The YOLO pose model detects 17 keypoints:

Nose
Left Eye
Right Eye
Left Ear
Right Ear
Left Shoulder
Right Shoulder
Left Elbow
Right Elbow
Left Wrist
Right Wrist
Left Hip
Right Hip
Left Knee
Right Knee
Left Ankle
Right Ankle

Performance Tuning

For Speed

processor = ultralytics.YOLOPoseProcessor(
    model_path="yolo11n-pose.pt",  # Nano model
    device="cuda",  # GPU if available
    imgsz=320,  # Smaller image size
    interval=0.1,  # Process every 100ms
    conf_threshold=0.6  # Higher threshold = fewer detections
)

For Accuracy

processor = ultralytics.YOLOPoseProcessor(
    model_path="yolo11l-pose.pt",  # Large model
    device="cuda",  # GPU required for large models
    imgsz=640,  # Larger image size
    interval=0,  # Process every frame
    conf_threshold=0.3  # Lower threshold = more detections
)

Configuration

Device Selection

# CPU (works everywhere)
processor = ultralytics.YOLOPoseProcessor(device="cpu")

# CUDA GPU (fastest)
processor = ultralytics.YOLOPoseProcessor(device="cuda")

# Auto-detect (use GPU if available)
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = ultralytics.YOLOPoseProcessor(device=device)

Dependencies

Required packages (automatically installed):

ultralytics>=8.0.0 - YOLO models
opencv-python>=4.8.0 - Image processing
numpy>=1.24.0 - Array operations
pillow>=10.0.0 - Image handling
aiortc>=1.6.0 - WebRTC support
av>=10.0.0 - Audio/video processing

References

Ultralytics YOLO Documentation
YOLO Pose Models
Model Zoo
Plugin Source: plugins/ultralytics/vision_agents/plugins/ultralytics/__init__.py

Get Started

Core Concepts

Building Agents

Integrations

Examples

Installation

Components

YOLOPoseProcessor

Usage Examples

Basic Pose Detection

Hand Tracking Enabled

GPU Acceleration

Process Every Other Frame

Features

Model Options

Keypoints Detected

Performance Tuning

For Speed

For Accuracy

Configuration

Device Selection

Dependencies

References

Build docs developers (and LLMs) love

Get Started

Core Concepts

Building Agents

Integrations

Examples

​Installation

​Components

​YOLOPoseProcessor

​Usage Examples

​Basic Pose Detection

​Hand Tracking Enabled

​GPU Acceleration

​Process Every Other Frame

​Features

​Model Options

​Keypoints Detected

​Performance Tuning

​For Speed

​For Accuracy

​Configuration

​Device Selection

​Dependencies

​References

Build docs developers (and LLMs) love

Installation

Components

YOLOPoseProcessor

Usage Examples

Basic Pose Detection

Hand Tracking Enabled

GPU Acceleration

Process Every Other Frame

Features

Model Options

Keypoints Detected

Performance Tuning

For Speed

For Accuracy

Configuration

Device Selection

Dependencies

References