Hand Tracking Setup

Sprout’s hand tracking feature uses MediaPipe for hand landmark detection and OpenCV for webcam capture. The system runs as a standalone WebSocket server that streams hand data to the frontend.

Overview

The hand tracking service:

Captures webcam input at 60 fps (capped)
Detects up to 2 hands using MediaPipe
Streams hand landmark data via WebSocket (ws://localhost:8765)
Supports natural gestures for camera control and node manipulation

Hand tracking is optional. Sprout works without it, but you lose the ability to control the 3D graph with hand gestures.

Prerequisites

Install Anaconda or Miniconda

Download and install Miniconda (recommended) or full Anaconda:

macOS
Linux
Windows

# Download Miniconda installer
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

# Install
bash Miniconda3-latest-MacOSX-x86_64.sh

# Verify
conda --version

# Download Miniconda installer
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

# Install
bash Miniconda3-latest-Linux-x86_64.sh

# Verify
conda --version

Download the installer from miniconda.org and run it.After installation, open Anaconda Prompt and verify:

conda --version

Verify Webcam Access

Ensure your system has a working webcam:

# macOS: Check camera in Photo Booth
# Linux: Use cheese or fswebcam
# Windows: Use Camera app

Environment Setup

Create an isolated conda environment for hand tracking dependencies:

Create Conda Environment

cd sprout-backend
conda create -n sprout-cv python=3.11 -y

Use Python 3.10 or 3.11. Avoid 3.12+ as MediaPipe may have compatibility issues.

Activate Environment

conda activate sprout-cv

Your prompt should change to show (sprout-cv).

Verify Python Version

python --version
# Should output: Python 3.11.x

Install Dependencies

The hand tracking service requires four Python packages with specific versions:

From requirements.txt (Recommended)
Manual Installation
With Cache Clearing (Troubleshooting)

cd sprout-backend
conda activate sprout-cv
pip install -r requirements.txt

requirements.txt contents:

mediapipe==0.10.14
opencv-python==4.13.0.92
websockets==12.0
numpy==2.4.2

Install packages individually:

conda activate sprout-cv
pip install mediapipe==0.10.14
pip install opencv-python==4.13.0.92
pip install websockets==12.0
pip install numpy==2.4.2

If you encounter installation issues:

pip install -r requirements.txt \
  --no-cache-dir \
  --default-timeout=100

This clears pip cache and increases timeout for slow networks.

Package Details

Package	Version	Purpose
`mediapipe`	0.10.14	Hand landmark detection (21 points per hand)
`opencv-python`	4.13.0.92	Webcam capture and image processing
`websockets`	12.0	WebSocket server for streaming data
`numpy`	2.4.2	Array operations (auto-installed by mediapipe)

Version Pinning: These exact versions are tested and known to work together. Using different versions may cause compatibility issues.

Running the Service

Start the WebSocket server:

cd sprout-backend
conda activate sprout-cv
python backend.py

Expected output:

WebSocket Server started on ws://localhost:8765

The service is now running and waiting for connections from the frontend.

Keep this terminal window open while using hand tracking. The service runs continuously until you stop it with Ctrl+C.

Configuration

The hand tracking service is configured in backend.py:

MediaPipe Settings

backend.py

mp_hands = mp.solutions.hands
hands = mp_hands.Hands(
    max_num_hands=2,              # Detect up to 2 hands
    model_complexity=0,           # 0=fast, 1=accurate
    min_detection_confidence=0.7, # Lower = more sensitive
    min_tracking_confidence=0.8,  # Higher = less jitter
)

max_num_hands
model_complexity
min_detection_confidence
min_tracking_confidence

Default: 2Maximum number of hands to detect simultaneously.

max_num_hands=1  # Single hand only
max_num_hands=2  # Both hands (default)

Default: 0MediaPipe model complexity:

0 = Fastest, least accurate
1 = Slower, most accurate

model_complexity=0  # Fast, good for real-time
model_complexity=1  # Accurate, may lag on slower hardware

Default: 0.7Confidence threshold for initial hand detection.

min_detection_confidence=0.5  # More sensitive (more false positives)
min_detection_confidence=0.9  # Less sensitive (may miss hands)

Default: 0.8Confidence threshold for tracking already-detected hands.

min_tracking_confidence=0.7  # Smoother but more jitter
min_tracking_confidence=0.9  # Less jitter but may lose tracking

Camera Settings

backend.py

cap = cv2.VideoCapture(0)  # Camera index

Default Camera
Multiple Cameras

cap = cv2.VideoCapture(0)

Uses the first available camera (index 0).

cap = cv2.VideoCapture(1)  # Use second camera

List available cameras:

# macOS
system_profiler SPCameraDataType

# Linux
v4l2-ctl --list-devices

Performance Settings

backend.py

SEND_INTERVAL = 1 / 60      # 60 fps cap
SMOOTH_ALPHA = 0.35         # Smoothing weight
PALM_HOLD_SECONDS = 3.0     # Open palm hold duration

Frame Rate
Smoothing
Grab Gesture

Default: 60 fps

SEND_INTERVAL = 1 / 60   # 60 fps
SEND_INTERVAL = 1 / 30   # 30 fps (lower CPU usage)

Default: 0.35Exponential moving average (EMA) weight for position smoothing.

SMOOTH_ALPHA = 0.2   # More smoothing, more lag
SMOOTH_ALPHA = 0.5   # Less smoothing, less lag

Default: 3.0 secondsDuration to hold open palm before entering grab mode.

PALM_HOLD_SECONDS = 2.0  # Faster activation
PALM_HOLD_SECONDS = 5.0  # Slower activation

Gesture System

The hand tracking service recognizes two main gestures:

1. Camera Control (Normal Hand)

When your hand is not in an open palm position:

Index finger tip position controls camera azimuth and elevation
Camera orbits around the current focus point

Detection: Any finger is not fully extended.

2. Grab Mode (Open Palm)

Hold an open palm for 3 seconds to enter grab mode:

All four fingers (index, middle, ring, pinky) must be extended
Palm center position is tracked
Hovering over a node while grabbing allows you to drag it in 3D space

Open Palm Detection (is_open_palm):

for tip_idx, pip_idx in [(8, 6), (12, 10), (16, 14), (20, 18)]:
    tip = landmarks[tip_idx]
    pip = landmarks[pip_idx]
    extended = (
        dist(tip, wrist) > dist(pip, wrist) * 1.08  # Tip farther than PIP
        or tip.y < pip.y  # Tip above PIP (y is down in image)
    )
    if not extended:
        return False  # Not an open palm
return True

Gesture Flow

Protocol

The WebSocket server sends JSON messages at 60 fps (capped):

{
  "hands": [
    {
      "handedness": "Right",
      "x": 0.512,
      "y": 0.384,
      "z": -0.042,
      "pinch": false,
      "palm_x": 0.501,
      "palm_y": 0.412,
      "palm_z": -0.038,
      "is_open_palm": true,
      "palm_hold_duration": 3.2,
      "is_grabbing": true
    }
  ]
}

Field Definitions

Field	Type	Description
`handedness`	string	”Left” or “Right”
`x`, `y`, `z`	float	Index finger tip position (normalized 0-1)
`pinch`	boolean	Thumb and index finger are pinching
`palm_x`, `palm_y`, `palm_z`	float	Palm center position (average of wrist + 4 MCP joints)
`is_open_palm`	boolean	All 4 fingers are extended
`palm_hold_duration`	float	Seconds open palm has been held
`is_grabbing`	boolean	Grab mode is active (palm held for 3s)

Both hands are sent when detected. The frontend uses handedness (not array index) to track gestures consistently.

Frontend Integration

The frontend connects to the WebSocket server from the hand tracking toggle:

const ws = new WebSocket('ws://localhost:8765');

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  // Process hand tracking data
};

Location: Bottom-right corner of the 3D graph view.

Troubleshooting

MediaPipe module not found

Error: ModuleNotFoundError: No module named 'mediapipe'Solution:

Verify you’re in the conda environment:
```
conda activate sprout-cv
```
Reinstall dependencies:
```
pip install -r requirements.txt
```

Camera permission denied

Error: cv2.VideoCapture returns None or blank frames.Solution:

Grant camera permissions in system settings
Close other apps using the webcam
Try a different camera index:
```
cap = cv2.VideoCapture(1)
```

WebSocket connection refused

Frontend shows “WebSocket error: Connection refused”.Solution:

Ensure python backend.py is running
Check the port is 8765 (default)
Verify firewall isn’t blocking localhost connections

Hand tracking is jittery

Hand positions jump around erratically.Solution:

Increase smoothing:
```
SMOOTH_ALPHA = 0.2  # More smoothing
```
Increase tracking confidence:
```
min_tracking_confidence=0.9
```
Improve lighting conditions
Use a higher quality webcam

Grab mode activates too easily

Grab mode triggers unintentionally.Solution: Increase palm hold duration:

PALM_HOLD_SECONDS = 5.0  # Require 5s hold

MediaPipe fails on Python 3.12+

Error: ImportError: DLL load failed or compatibility issues.Solution:

Delete the environment:

conda deactivate
conda env remove -n sprout-cv

Recreate with Python 3.11:

conda create -n sprout-cv python=3.11 -y
conda activate sprout-cv
pip install -r requirements.txt

Advanced Configuration

Custom WebSocket Port

Change the WebSocket port in backend.py:

backend.py

async def main():
    async with websockets.serve(handler, "localhost", 9000):  # Changed from 8765
        print("WebSocket Server started on ws://localhost:9000")
        await asyncio.Future()

If you change the port, update the frontend WebSocket connection URL to match.

Multiple Camera Support

Cycle through cameras to find the correct index:

test_cameras.py

import cv2

for i in range(10):
    cap = cv2.VideoCapture(i)
    if cap.isOpened():
        print(f"Camera {i}: Available")
        cap.release()
    else:
        print(f"Camera {i}: Not available")

Logging Hand Data

Add logging for debugging:

backend.py

import json

async def handler(websocket, path):
    # ... existing code ...
    
    # Log every 60 frames (1 second at 60fps)
    if frame_count % 60 == 0:
        print(f"Hand data: {json.dumps(frame_data, indent=2)}")

Setup

Guides

Hand Tracking Setup

Overview

Prerequisites

Environment Setup

Install Dependencies

Package Details

Running the Service

Configuration

MediaPipe Settings

Camera Settings

Performance Settings

Gesture System

1. Camera Control (Normal Hand)

2. Grab Mode (Open Palm)

Gesture Flow

Protocol

Field Definitions

Frontend Integration

Troubleshooting

Advanced Configuration

Custom WebSocket Port

Multiple Camera Support

Logging Hand Data

Next Steps

Document Uploads

Running Locally

Build docs developers (and LLMs) love

Setup

Guides

​Overview

​Prerequisites

​Environment Setup

​Install Dependencies

​Package Details

​Running the Service

​Configuration

​MediaPipe Settings

​Camera Settings

​Performance Settings

​Gesture System

​1. Camera Control (Normal Hand)

​2. Grab Mode (Open Palm)

​Gesture Flow

​Protocol

​Field Definitions

​Frontend Integration

​Troubleshooting

​Advanced Configuration

​Custom WebSocket Port

​Multiple Camera Support

​Logging Hand Data

​Next Steps

Document Uploads

Running Locally

Build docs developers (and LLMs) love

Overview

Prerequisites

Environment Setup

Install Dependencies

Package Details

Running the Service

Configuration

MediaPipe Settings

Camera Settings

Performance Settings

Gesture System

1. Camera Control (Normal Hand)

2. Grab Mode (Open Palm)

Gesture Flow

Protocol

Field Definitions

Frontend Integration

Troubleshooting

Advanced Configuration

Custom WebSocket Port

Multiple Camera Support

Logging Hand Data

Next Steps