Overview
Sprout features experimental hand tracking that lets you navigate the 3D knowledge graph using natural hand gestures. Powered by OpenCV and MediaPipe, the system tracks up to 2 hands simultaneously and translates finger positions into camera control and node manipulation.Hand tracking runs as a separate Python WebSocket server (port 8765) that streams landmark data at 60 FPS to the React frontend.
Architecture
Backend: Python WebSocket Server
File:sprout-backend/backend.py
OpenCV Capture
Captures webcam feed at native resolution, flipped horizontally for mirror effect.
MediaPipe Hands
Detects hand landmarks with
model_complexity=0 (fastest model) for minimal latency.WebSocket Streaming
Streams hand data at 60 FPS via WebSocket (rate-limited to prevent queue buildup).
Multi-Hand Support
Tracks up to 2 hands simultaneously with independent smoothing and gesture detection.
Frontend: React Hook
File:sprout-frontend/src/hooks/use-hand-tracking.ts
- Auto-reconnect: Retries connection every 3 seconds on disconnect
- State parsing: Converts WebSocket JSON to typed
HandSample[] - Camera hand selection: First non-grabbing hand controls camera orbit
Hand Landmark Detection
MediaPipe detects 21 landmarks per hand:- Key Landmarks
- Coordinate System
| Index | Landmark | Purpose |
|---|---|---|
| 0 | Wrist | Base reference point |
| 4 | Thumb tip | Pinch gesture detection |
| 8 | Index tip | Primary cursor position |
| 5, 9, 13, 17 | Finger MCPs | Palm center calculation |
| 6, 10, 14, 18 | Finger PIPs | Open palm detection |
Gesture Recognition
Open Palm Detection
Algorithm: All 4 fingers (index, middle, ring, pinky) must be extended.The 1.08 multiplier provides tolerance for slight finger bending—prevents false negatives.
Pinch Gesture
Measurement: Euclidean distance between thumb tip (4) and index tip (8).- pinch < 0.05: Pinched (zoom in)
- pinch > 0.20: Spread (zoom out)
- Maps to camera radius: [0.02, 0.35] → [80px, 550px]
Grab Mode (3-Second Hold)
Holding an open palm for 3 seconds activates grab mode for node manipulation:Navigation Modes
Mode 1: Camera Orbit (Default)
When not grabbing, your hand controls the camera:- Pan
- Tilt
- Zoom
Hand X (left-right) → Camera orbits around the graph center (theta angle)
Mode 2: Node Grabbing
When is_grabbing = true, your palm position controls node position:- Capture
- Drag
- Release
Trigger: Open palm held for 3 seconds near a node (within 140px radius)Node selection:
- Projects all main nodes (root/concept) to screen space
- Finds nearest node to palm position
- Captures that node + all its subconcepts (if concept node)
Only root and concept nodes can be grabbed—subconcepts move with their parent.
Grab Exclusivity
Only one hand at a time can grab:If both hands are grabbing, the second hand is ignored. This prevents conflicting drag operations.
Visual Feedback
Palm Progress Ring
When an open palm is detected, an SVG ring renders at palm position:- Yellow (
#facc15): Charging (0-3 seconds) - Green (
#4ade80): Grabbing (3+ seconds)
Hand Cursor (Optional)
A custom cursor follows the index fingertip when not grabbing: File:sprout-frontend/src/components/hand-cursor.tsx
Performance Optimizations
Backend
Model Complexity 0
Lightest MediaPipe model (fewer landmarks, faster inference) for under 16ms frame time.
60 FPS Rate Limit
WebSocket sends capped at 16.67ms intervals to prevent queue buildup.
EMA Smoothing
Alpha = 0.35 balances responsiveness vs. jitter reduction.
Per-Connection State
Each WebSocket client has isolated
HandState instances—no shared buffers.Frontend
requestAnimationFrame
Camera updates run in rAF loop (60 FPS) independent of WebSocket rate.
Ref-Based State
Hot path reads refs (not state) to avoid triggering re-renders.
Screen-Space Projection
Node grabbing uses
graph2ScreenCoords() for accurate hit detection (camera-aware).Smooth Offset Lerp
Drag offset smoothed at 6% per frame to eliminate microjitter.
Setup Instructions
1. Install Python Dependencies
2. Start Hand Tracking Server
3. Enable Hand Tracking in UI
Toggle the hand tracking button in the graph view. The frontend will connect tows://localhost:8765 and display connection status.
4. Webcam Permissions
Ensure your browser/OS grants webcam access to the Python process. On macOS, you may need to allow Terminal in System Preferences > Privacy > Camera.Limitations and Future Work
Single Client
Each Python server instance supports one WebSocket client. Multiple users need separate server instances.
Lighting Sensitivity
MediaPipe performance degrades in low light. Requires well-lit environment for reliable tracking.
Handedness Consistency
If hands swap sides rapidly, handedness labels may flicker (MediaPipe limitation).
CPU Intensive
Video processing is CPU-bound. On low-end machines, consider increasing
SEND_INTERVAL to reduce load.- GPU acceleration via CUDA or Metal
- Gesture vocabulary expansion (fist, thumbs-up, etc.)
- Multi-user collaboration with hand ID tracking
- Haptic feedback via browser Vibration API
Technical Details
Palm Center Calculation
The palm center is the average of 5 landmarks (more stable than wrist alone):WebSocket Protocol
Message format:Force Graph Integration
File:sprout-frontend/src/components/force-graph-view.tsx
The hand tracking hook passes data to the force graph component:
Hand tracking is entirely optional—all graph features work perfectly with mouse/trackpad. It’s an experimental enhancement for immersive exploration.