Skip to main content
Sprout’s hand tracking feature uses MediaPipe for hand landmark detection and OpenCV for webcam capture. The system runs as a standalone WebSocket server that streams hand data to the frontend.

Overview

The hand tracking service:
  • Captures webcam input at 60 fps (capped)
  • Detects up to 2 hands using MediaPipe
  • Streams hand landmark data via WebSocket (ws://localhost:8765)
  • Supports natural gestures for camera control and node manipulation
Hand tracking is optional. Sprout works without it, but you lose the ability to control the 3D graph with hand gestures.

Prerequisites

1

Install Anaconda or Miniconda

Download and install Miniconda (recommended) or full Anaconda:
# Download Miniconda installer
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

# Install
bash Miniconda3-latest-MacOSX-x86_64.sh

# Verify
conda --version
2

Verify Webcam Access

Ensure your system has a working webcam:
# macOS: Check camera in Photo Booth
# Linux: Use cheese or fswebcam
# Windows: Use Camera app

Environment Setup

Create an isolated conda environment for hand tracking dependencies:
1

Create Conda Environment

cd sprout-backend
conda create -n sprout-cv python=3.11 -y
Use Python 3.10 or 3.11. Avoid 3.12+ as MediaPipe may have compatibility issues.
2

Activate Environment

conda activate sprout-cv
Your prompt should change to show (sprout-cv).
3

Verify Python Version

python --version
# Should output: Python 3.11.x

Install Dependencies

The hand tracking service requires four Python packages with specific versions:

Package Details

PackageVersionPurpose
mediapipe0.10.14Hand landmark detection (21 points per hand)
opencv-python4.13.0.92Webcam capture and image processing
websockets12.0WebSocket server for streaming data
numpy2.4.2Array operations (auto-installed by mediapipe)
Version Pinning: These exact versions are tested and known to work together. Using different versions may cause compatibility issues.

Running the Service

Start the WebSocket server:
cd sprout-backend
conda activate sprout-cv
python backend.py
Expected output:
WebSocket Server started on ws://localhost:8765
The service is now running and waiting for connections from the frontend.
Keep this terminal window open while using hand tracking. The service runs continuously until you stop it with Ctrl+C.

Configuration

The hand tracking service is configured in backend.py:

MediaPipe Settings

backend.py
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(
    max_num_hands=2,              # Detect up to 2 hands
    model_complexity=0,           # 0=fast, 1=accurate
    min_detection_confidence=0.7, # Lower = more sensitive
    min_tracking_confidence=0.8,  # Higher = less jitter
)
Default: 2Maximum number of hands to detect simultaneously.
max_num_hands=1  # Single hand only
max_num_hands=2  # Both hands (default)

Camera Settings

backend.py
cap = cv2.VideoCapture(0)  # Camera index
cap = cv2.VideoCapture(0)
Uses the first available camera (index 0).

Performance Settings

backend.py
SEND_INTERVAL = 1 / 60      # 60 fps cap
SMOOTH_ALPHA = 0.35         # Smoothing weight
PALM_HOLD_SECONDS = 3.0     # Open palm hold duration
Default: 60 fps
SEND_INTERVAL = 1 / 60   # 60 fps
SEND_INTERVAL = 1 / 30   # 30 fps (lower CPU usage)

Gesture System

The hand tracking service recognizes two main gestures:

1. Camera Control (Normal Hand)

When your hand is not in an open palm position:
  • Index finger tip position controls camera azimuth and elevation
  • Camera orbits around the current focus point
Detection: Any finger is not fully extended.

2. Grab Mode (Open Palm)

Hold an open palm for 3 seconds to enter grab mode:
  • All four fingers (index, middle, ring, pinky) must be extended
  • Palm center position is tracked
  • Hovering over a node while grabbing allows you to drag it in 3D space
Open Palm Detection (is_open_palm):
for tip_idx, pip_idx in [(8, 6), (12, 10), (16, 14), (20, 18)]:
    tip = landmarks[tip_idx]
    pip = landmarks[pip_idx]
    extended = (
        dist(tip, wrist) > dist(pip, wrist) * 1.08  # Tip farther than PIP
        or tip.y < pip.y  # Tip above PIP (y is down in image)
    )
    if not extended:
        return False  # Not an open palm
return True

Gesture Flow

Protocol

The WebSocket server sends JSON messages at 60 fps (capped):
{
  "hands": [
    {
      "handedness": "Right",
      "x": 0.512,
      "y": 0.384,
      "z": -0.042,
      "pinch": false,
      "palm_x": 0.501,
      "palm_y": 0.412,
      "palm_z": -0.038,
      "is_open_palm": true,
      "palm_hold_duration": 3.2,
      "is_grabbing": true
    }
  ]
}

Field Definitions

FieldTypeDescription
handednessstring”Left” or “Right”
x, y, zfloatIndex finger tip position (normalized 0-1)
pinchbooleanThumb and index finger are pinching
palm_x, palm_y, palm_zfloatPalm center position (average of wrist + 4 MCP joints)
is_open_palmbooleanAll 4 fingers are extended
palm_hold_durationfloatSeconds open palm has been held
is_grabbingbooleanGrab mode is active (palm held for 3s)
Both hands are sent when detected. The frontend uses handedness (not array index) to track gestures consistently.

Frontend Integration

The frontend connects to the WebSocket server from the hand tracking toggle:
const ws = new WebSocket('ws://localhost:8765');

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  // Process hand tracking data
};
Location: Bottom-right corner of the 3D graph view.

Troubleshooting

Error: ModuleNotFoundError: No module named 'mediapipe'Solution:
  1. Verify you’re in the conda environment:
    conda activate sprout-cv
    
  2. Reinstall dependencies:
    pip install -r requirements.txt
    
Error: cv2.VideoCapture returns None or blank frames.Solution:
  1. Grant camera permissions in system settings
  2. Close other apps using the webcam
  3. Try a different camera index:
    cap = cv2.VideoCapture(1)
    
Frontend shows “WebSocket error: Connection refused”.Solution:
  1. Ensure python backend.py is running
  2. Check the port is 8765 (default)
  3. Verify firewall isn’t blocking localhost connections
Hand positions jump around erratically.Solution:
  1. Increase smoothing:
    SMOOTH_ALPHA = 0.2  # More smoothing
    
  2. Increase tracking confidence:
    min_tracking_confidence=0.9
    
  3. Improve lighting conditions
  4. Use a higher quality webcam
Grab mode triggers unintentionally.Solution: Increase palm hold duration:
PALM_HOLD_SECONDS = 5.0  # Require 5s hold
Error: ImportError: DLL load failed or compatibility issues.Solution:
  1. Delete the environment:
    conda deactivate
    conda env remove -n sprout-cv
    
  2. Recreate with Python 3.11:
    conda create -n sprout-cv python=3.11 -y
    conda activate sprout-cv
    pip install -r requirements.txt
    

Advanced Configuration

Custom WebSocket Port

Change the WebSocket port in backend.py:
backend.py
async def main():
    async with websockets.serve(handler, "localhost", 9000):  # Changed from 8765
        print("WebSocket Server started on ws://localhost:9000")
        await asyncio.Future()
If you change the port, update the frontend WebSocket connection URL to match.

Multiple Camera Support

Cycle through cameras to find the correct index:
test_cameras.py
import cv2

for i in range(10):
    cap = cv2.VideoCapture(i)
    if cap.isOpened():
        print(f"Camera {i}: Available")
        cap.release()
    else:
        print(f"Camera {i}: Not available")

Logging Hand Data

Add logging for debugging:
backend.py
import json

async def handler(websocket, path):
    # ... existing code ...
    
    # Log every 60 frames (1 second at 60fps)
    if frame_count % 60 == 0:
        print(f"Hand data: {json.dumps(frame_data, indent=2)}")

Next Steps

Document Uploads

Configure S3 for document storage

Running Locally

Start all services in development mode

Build docs developers (and LLMs) love