TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/RealComputer/GlassKit/llms.txt
Use this file to discover all available pages before exploring further.
rokid-rfdetr example puts a vision-driven speedrun HUD on Rokid Glasses. The glasses stream live camera video to a FastAPI backend that runs RF-DETR object detection, tracks a configurable set of splits, and pushes state updates back to the HUD over a WebRTC data channel. A sushi-making speedrun is included as the reference configuration, but the system is fully data-driven: define your own objects and splits in a single JSON file to time any physical task. The example also captures annotated frames to disk so you can inspect detections and tune your model.
Features
- Global timer and split timing HUD — monochrome display shows elapsed time per split and overall.
- Configurable speedrun definitions — groups, splits, and object-detection class mappings live in
backend/speedrun_config.json. - Two-hit confirmation — each split requires two consecutive detections before advancing, reducing false positives.
- Annotated frame capture — backend saves JPEG frames with bounding boxes for inspection and model tuning.
- Manual split advance/back — swipe forward or backward on the touchpad to move splits during testing.
- Temple tap to start — tap the temple area to begin the run timer.
Architecture
| Component | Location | Language |
|---|---|---|
| Glasses app | rokid/ | Kotlin |
| FastAPI backend + RF-DETR inference | backend/ | Python 3.12 |
rokid/) runs a single WebRTC session to /vision/session. It sends H.264 video and receives config/state/split events over a data channel. BackendVisionClient owns WebRTC setup and data channel messaging. MainActivity handles HUD rendering, timer management, and touchpad controls.
The backend (backend/) exposes a FastAPI /vision/session endpoint. main.py accepts the WebRTC offer, sets up the peer connection, and wires incoming video tracks to VisionProcessor. The vision processor runs the RF-DETR inference loop on the latest frame and calls SpeedrunController.on_detection(). SpeedrunController implements the two-hit confirmation rule and the split state machine, then broadcasts state events back over the data channel.
Requirements
- Rokid Glasses + dev cable
- Android Studio with
adb - Python 3.12 with
uv - Roboflow API key (
ROBOFLOW_API_KEY) — only needed to download weights once; inference runs locally after that.
Configuration
Configure your speedrun
Edit
backend/speedrun_config.json with your speedrun name, split groups, and the detection class that triggers each split. See the Speedrun Config Format section below.Speedrun Config Format
The entire speedrun definition lives inbackend/speedrun_config.json. Here is the full reference sushi config included with the example:
complete_on_class value must match a class name your trained RF-DETR model can detect. Splits advance in order within each group; groups run sequentially.
Backend Environment Overrides
In addition toROBOFLOW_API_KEY, the backend supports these optional overrides:
| Variable | Description |
|---|---|
RFDETR_MODEL_ID | Roboflow model ID to download (defaults to the example sushi model). |
RFDETR_CONFIDENCE | Minimum detection confidence threshold (0.0–1.0). |
RFDETR_FRAME_DIR | Directory where annotated frames are saved. |
RFDETR_HISTORY_LIMIT | Number of annotated frames kept on disk. |
RFDETR_JPEG_QUALITY | JPEG quality for saved annotated frames (1–95). |
Run the Backend
Run the Glasses App
Key Backend Files
| File | Description |
|---|---|
backend/main.py | FastAPI app, /vision/session WebRTC endpoint, data channel handling. |
backend/vision.py | RF-DETR inference loop and annotated frame saving. |
backend/speedrun.py | Speedrun config loader and split state machine. |
backend/speedrun_config.json | Speedrun name, groups/splits, and detection class mapping. |
backend/.env.example | Environment variable template. |
How to Prepare a Model
Each speedrun needs a fine-tuned RF-DETR model whose class names match thecomplete_on_class values in your config.
Record footage
Use the standard Rokid Glasses video recording feature to capture example runs of your physical task, without running the app.
Train the model
Fine-tune an RF-DETR model on your footage. Follow the RF-DETR training guide on YouTube for a step-by-step walkthrough using Roboflow.
Choose a weight loading strategy
The backend uses the Roboflow
inference library by default, which downloads weights from Roboflow on first run and then caches them locally. ROBOFLOW_API_KEY is only needed for that initial download.To avoid any Roboflow dependency, train or export weights elsewhere (for example, in a Colab notebook) and switch the backend to use the rfdetr library directly. This lets you load weights from a local path with no API key required.Related Examples
- IKEA Assembly Assistant — voice-first assistant with OpenAI Realtime; see the
rokid-openai-realtime-rfdetrvariant for RF-DETR integrated with Realtime. - Proactive Drink-making Coach — combines Overshoot VLM inference with OpenAI Realtime speech for a full-stack proactive assistant.