TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/RealComputer/GlassKit/llms.txt
Use this file to discover all available pages before exploring further.
rokid-openai-realtime example turns Rokid Glasses into a voice-first assembly assistant. The glasses stream microphone audio and camera video over WebRTC directly to the OpenAI Realtime API. A lightweight Node.js backend brokers the WebRTC session and handles sideband tool calls — in the default setup, the assistant walks the wearer through assembling an IKEA wooden box, loading step-by-step instructions on demand via two tool functions. Swap the instruction files or rewrite the system prompt and you have a domain-specific hands-free assistant for any physical task.
Features
- End-to-end WebRTC — mic and camera stream from the glasses to OpenAI Realtime with no intermediate media relay.
- Real-time audio and vision — the assistant observes live video frames and hears the wearer simultaneously.
- Assistant speech playback — the Realtime API streams audio back and the glasses play it through their speaker.
- Sideband tool calls — a backend WebSocket listens for
response.doneevents and handleslist_items/load_item_instructionsfunction calls so the model can dynamically load assembly instructions.
Architecture
| Component | Location | Language |
|---|---|---|
| Glasses app | rokid/ | Kotlin |
| Backend session broker + tool handler | backend/ | TypeScript (Node.js 24, ESM) |
MainActivity. It auto-starts streaming after camera and mic permissions are granted. A temple tap (KEYCODE_DPAD_CENTER / ENTER) toggles start and stop. Media is managed by OpenAIRealtimeClient using the Stream WebRTC library.
The backend entry point is backend/server.ts. The POST /session endpoint accepts a raw SDP offer body, forwards it to OpenAI with the session configuration, and returns the SDP answer. It then opens a sideband WebSocket to the same call ID and handles tool calls from that connection.
Requirements
- Rokid Glasses + dev cable
- Android Studio with
adb - Node.js 24
- OpenAI API key (
OPENAI_API_KEY)
Configuration
Run the Backend
3000 by default. Override with the PORT environment variable.
Run the Glasses App
Connect Rokid Glasses and enable Wi-Fi
Connect the glasses to your computer using the dev cable, then run:
Session Configuration
The backend sends asessionConfig object to OpenAI with audio settings, the system prompt, and tool definitions. Key fields from backend/server.ts:
Customize Instructions
To add or replace assembly instructions:-
Add an instruction file — place a
.txtfile inbackend/items/. The filename (without the.txtextension) becomes the item name returned bylist_items. The current example isbackend/items/ikea-wooden-box.txt. -
Edit the system prompt — change
SESSION_INSTRUCTIONSinbackend/server.tsto adjust the assistant’s role, personality, and conversation rules.
How Sideband Tool Calls Work
When the OpenAI Realtime API completes a function call, it emits aresponse.done event over the sideband WebSocket. The backend parses it, calls runTool() with the function name and arguments, and sends back a conversation.item.create message with the tool output followed by response.create to resume generation.
Related Examples
rokid-openai-realtime-rfdetr— an updated version of this project that adds RF-DETR object detection on the backend and injects annotated frames into the Realtime conversation for more accurate spatial understanding. See itsREADME.mdin the repository for setup steps.- Speedrun Timer (RF-DETR) — vision-only speedrun HUD with RF-DETR detection and split timing.
- Proactive Drink-making Coach — full-stack example combining Overshoot vision inference, OpenAI Realtime speech, and a server-authoritative workflow.