Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/RealComputer/GlassKit/llms.txt

Use this file to discover all available pages before exploring further.

The rokid-rfdetr example puts a vision-driven speedrun HUD on Rokid Glasses. The glasses stream live camera video to a FastAPI backend that runs RF-DETR object detection, tracks a configurable set of splits, and pushes state updates back to the HUD over a WebRTC data channel. A sushi-making speedrun is included as the reference configuration, but the system is fully data-driven: define your own objects and splits in a single JSON file to time any physical task. The example also captures annotated frames to disk so you can inspect detections and tune your model.

Features

  • Global timer and split timing HUD — monochrome display shows elapsed time per split and overall.
  • Configurable speedrun definitions — groups, splits, and object-detection class mappings live in backend/speedrun_config.json.
  • Two-hit confirmation — each split requires two consecutive detections before advancing, reducing false positives.
  • Annotated frame capture — backend saves JPEG frames with bounding boxes for inspection and model tuning.
  • Manual split advance/back — swipe forward or backward on the touchpad to move splits during testing.
  • Temple tap to start — tap the temple area to begin the run timer.

Architecture

ComponentLocationLanguage
Glasses approkid/Kotlin
FastAPI backend + RF-DETR inferencebackend/Python 3.12
The Android app (rokid/) runs a single WebRTC session to /vision/session. It sends H.264 video and receives config/state/split events over a data channel. BackendVisionClient owns WebRTC setup and data channel messaging. MainActivity handles HUD rendering, timer management, and touchpad controls. The backend (backend/) exposes a FastAPI /vision/session endpoint. main.py accepts the WebRTC offer, sets up the peer connection, and wires incoming video tracks to VisionProcessor. The vision processor runs the RF-DETR inference loop on the latest frame and calls SpeedrunController.on_detection(). SpeedrunController implements the two-hit confirmation rule and the split state machine, then broadcasts state events back over the data channel.

Requirements

  • Rokid Glasses + dev cable
  • Android Studio with adb
  • Python 3.12 with uv
  • Roboflow API key (ROBOFLOW_API_KEY) — only needed to download weights once; inference runs locally after that.

Configuration

1

Configure the glasses app

Fill out rokid/local.properties with the backend WebRTC session URL:
VISION_SESSION_URL=http://<YOUR_BACKEND>/vision/session
2

Configure the backend

cd backend
cp .env.example .env
# Set ROBOFLOW_API_KEY in .env
3

Configure your speedrun

Edit backend/speedrun_config.json with your speedrun name, split groups, and the detection class that triggers each split. See the Speedrun Config Format section below.

Speedrun Config Format

The entire speedrun definition lives in backend/speedrun_config.json. Here is the full reference sushi config included with the example:
{
  "name": "Sushi Speedrun: Trio Any%",
  "groups": [
    {
      "name": "Tuna Nigiri",
      "splits": [
        { "label": "Pick up rice", "complete_on_class": "rice_in_hand" },
        { "label": "Top with tuna", "complete_on_class": "nigiri_on_board" }
      ]
    },
    {
      "name": "Cucumber Maki",
      "splits": [
        { "label": "Lay nori", "complete_on_class": "maki_nori_on_makisu" },
        { "label": "Add rice and cucumber", "complete_on_class": "maki_ready_with_rice_cucumber" },
        { "label": "Roll and cut", "complete_on_class": "maki_piece_on_board" }
      ]
    },
    {
      "name": "Ikura Gunkan",
      "splits": [
        { "label": "Form rice, wrap nori", "complete_on_class": "gunkan_rice_nori_base_ready" },
        { "label": "Top with ikura", "complete_on_class": "gunkan_ikura_on_plate" }
      ]
    }
  ]
}
Each complete_on_class value must match a class name your trained RF-DETR model can detect. Splits advance in order within each group; groups run sequentially.

Backend Environment Overrides

In addition to ROBOFLOW_API_KEY, the backend supports these optional overrides:
VariableDescription
RFDETR_MODEL_IDRoboflow model ID to download (defaults to the example sushi model).
RFDETR_CONFIDENCEMinimum detection confidence threshold (0.0–1.0).
RFDETR_FRAME_DIRDirectory where annotated frames are saved.
RFDETR_HISTORY_LIMITNumber of annotated frames kept on disk.
RFDETR_JPEG_QUALITYJPEG quality for saved annotated frames (1–95).

Run the Backend

cd backend
uv sync
uv run --env-file .env fastapi dev main.py --host 0.0.0.0

Run the Glasses App

1

Connect Rokid Glasses and enable Wi-Fi

adb devices
adb shell cmd wifi status
adb shell cmd wifi set-wifi-enabled enabled
adb shell 'cmd wifi connect-network "NAME" wpa2 "PASSWORD"'
adb shell cmd wifi status
2

Optional: wireless ADB

adb shell ip -f inet addr show wlan0
ping -c 5 -W 3 <IP>
adb tcpip 5555
adb connect <IP>
adb devices
3

Build and run

Open the rokid/ directory in Android Studio, select Rokid Glasses, and run the app. To rebuild after changes:
cd rokid && ./gradlew :app:assembleDebug

Key Backend Files

FileDescription
backend/main.pyFastAPI app, /vision/session WebRTC endpoint, data channel handling.
backend/vision.pyRF-DETR inference loop and annotated frame saving.
backend/speedrun.pySpeedrun config loader and split state machine.
backend/speedrun_config.jsonSpeedrun name, groups/splits, and detection class mapping.
backend/.env.exampleEnvironment variable template.

How to Prepare a Model

Each speedrun needs a fine-tuned RF-DETR model whose class names match the complete_on_class values in your config.
1

Record footage

Use the standard Rokid Glasses video recording feature to capture example runs of your physical task, without running the app.
2

Train the model

Fine-tune an RF-DETR model on your footage. Follow the RF-DETR training guide on YouTube for a step-by-step walkthrough using Roboflow.
3

Choose a weight loading strategy

The backend uses the Roboflow inference library by default, which downloads weights from Roboflow on first run and then caches them locally. ROBOFLOW_API_KEY is only needed for that initial download.To avoid any Roboflow dependency, train or export weights elsewhere (for example, in a Colab notebook) and switch the backend to use the rfdetr library directly. This lets you load weights from a local path with no API key required.
4

Write your speedrun config

Create a new speedrun_config.json with group names, split labels, and complete_on_class values that match the class names in your trained model.
Annotated frames saved to RFDETR_FRAME_DIR are useful for debugging false positives. If a split is triggering too early or too late, inspect the saved frames to understand what the model is seeing at those moments.
  • IKEA Assembly Assistant — voice-first assistant with OpenAI Realtime; see the rokid-openai-realtime-rfdetr variant for RF-DETR integrated with Realtime.
  • Proactive Drink-making Coach — combines Overshoot VLM inference with OpenAI Realtime speech for a full-stack proactive assistant.

Build docs developers (and LLMs) love