What You’ll Learn
- Using event-driven architecture with processors
- Subscribing to detection events from Roboflow
- Implementing debouncing to control LLM calls
- Combining object detection with realtime video analysis
- Detecting game events (ball disappearance/reappearance)
Features
- Real-time player and ball detection using Roboflow
- Event-driven commentary triggered by game action
- Debounced LLM calls to avoid overwhelming the model
- Annotated video with bounding boxes
- Works with both OpenAI Realtime and Gemini Live
Architecture
The system uses a two-model approach:- Video input is sent to Roboflow’s RF-DETR model
- Detections emit events and annotate the video with bounding boxes
- When detection criteria are met (ball detected + not debounced), the LLM is prompted
- The realtime model provides commentary on the match event
Prerequisites
You’ll need API keys for:- Stream (for video/audio infrastructure)
- OpenAI or Gemini (for realtime LLM)
- Roboflow (automatic, no key needed for local models)
Setup
Complete Code
Code Walkthrough
Roboflow Detection Processor
The processor handles real-time object detection:classes: Filters detections to specific object typesconf_threshold=0.5: Low enough to catch the ball in motion, high enough to avoid false positivesfps=5: Fast enough to track movement without overwhelming the system
Event Subscription
Subscribe to detection events to trigger actions:event.objects: List of detected objects with labels, bounding boxes, and confidence scoresevent.timestamp: When the detection occurredevent.frame: The video frame (optional)
Debouncing
Without debouncing, the agent would call the LLM every time a detection occurs (potentially many times per second). TheDebouncer class limits calls:
Debouncer utility (from utils.py):
Instructions File
Theinstructions.md file provides context for the LLM:
Switching Between LLM Providers
Vision Agents makes it easy to swap models:Advanced: Event-Based Detection
Instead of time-based debouncing, you can detect specific game events. For example, detecting when the ball reappears after disappearing (suggesting a fast play):Performance Benchmarks
From real-world testing with ~30 prompts per configuration:| Provider | FPS | Mean TTFA | StdDev | Min | Max |
|---|---|---|---|---|---|
| OpenAI Realtime | 1 | 0.39s | 0.10s | 0.31s | 0.72s |
| OpenAI Realtime | 2 | 0.47s | 0.22s | 0.32s | 1.20s |
| Gemini Live | 1 | 3.06s | 0.88s | 1.52s | 5.05s |
| Gemini Live | 2 | 4.08s | 1.04s | 2.75s | 6.85s |
- OpenAI Realtime is ~8x faster to respond
- Higher FPS doesn’t improve latency (may worsen it slightly)
- OpenAI has more consistent latency (WebRTC vs WebSocket)
Limitations & Future Improvements
Current realtime models struggle with fast-action sports because:- Limited video context: Models seem to reason over just a few frames
- High-motion inference: Fast action causes accuracy issues
- Latency: 2-4s response time is too slow for live commentary
- Static camera angle and better footage quality
- More sophisticated event detection (actual game events, not just ball tracking)
- Potentially replacing realtime models with: Detection → Event logic → LLM → TTS
Use Cases Beyond Sports
This event-driven pattern works well for:- Security monitoring (alert on specific detections)
- Manufacturing QA (comment on defects)
- Wildlife observation (identify and describe animals)
- Traffic analysis (report on congestion, accidents)
- Retail analytics (customer behavior insights)
Next Steps
- Try the Security Camera Example for more advanced object tracking
- Explore the Golf Coach Example for pose-based analysis
- Read the Processors Guide for building custom processors