The GlassKit agent skill packages smart-glasses context — device constraints, reference patterns, and a starter template — into a format that AI coding agents can read and act on. This page explains what the skill contains, why it matters for glasses development, which agents support it, how to install and update it, and example prompts to get started.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/RealComputer/GlassKit/llms.txt
Use this file to discover all available pages before exploring further.
Why Coding Agents Need It
Smart-glasses apps have unique aspects that general-purpose coding agents are not prepared for out of the box:- HUD constraints — Rokid Glasses use a monochrome 480×640 portrait display. Black is transparent; white is green. There is no touchscreen.
- Input model — The only physical controls are a temple touchpad (tap, double-tap, swipe forward, swipe backward). Voice commands are offline and fixed-phrase.
- Camera and microphone — Device-specific capture constraints, codec preferences, and HAL quirks affect WebRTC streaming behavior.
- Battery and performance — The glasses have less CPU and RAM than phones. Implementations must be efficient.
- Wearer UX — The wearer cannot look at a phone while using the app. Spoken guidance, minimal HUD text, and low-latency feedback matter more than they do in phone apps.
Which Agents It Works With
The GlassKit skill is distributed through the Agent Skills CLI. Any agent that supports that CLI can use it, including:- OpenAI Codex
- Claude Code
- Cursor
- Other editors and CLI agents that support the Agent Skills spec
Installing the Skill
Run the install command once in your project directory or globally:Updating the Skill
When GlassKit ships new references or updates existing ones, pull the latest version with:What the Skill Includes
After installation the agent can read three things:SKILL.md
Top-level orientation: Rokid hardware basics, display and input constraints, workflow steps, and a reference index. The agent reads this first.
references/
Detailed reference docs covering Rokid setup, inputs, voice commands, WebRTC, proactive perception, OpenAI Realtime, and object detection.
assets/rokid-hello-world/
A minimal Rokid Glasses starter app the agent can copy into your workspace as a scaffold. Includes HUD layout and touchpad navigation patterns.
| Reference | What it covers |
|---|---|
rokid-setup.md | Hardware, Wi-Fi/ADB connection, common commands, phone/emulator setup |
rokid-inputs.md | Touchpad handling, camera access, microphone access |
vosk-voice-commands.md | Offline command-word recognition setup and implementation |
rokid-webrtc.md | WebRTC sessions, audio/video tracks, SDP signaling, data channels, ICE/TURN |
proactive-perception-pattern.md | Continuous observation pattern for workflow-driven apps |
openai-realtime.md | WebRTC media brokering, VAD, backend-gated turns, sideband events |
object-detection.md | Backend inference, normalized events, RF-DETR, realtime model augmentation |
Example Prompts
Once the skill is installed, use prompts like these with your agent:What Happens When the Agent Uses the Skill
Agent reads SKILL.md
The agent loads the top-level orientation doc to understand Rokid hardware, display constraints, touchpad input model, and the available reference set.
Agent reads relevant references
Based on the task, the agent reads the specific references it needs — for example
rokid-webrtc.md and openai-realtime.md for a voice assistant, or object-detection.md for a detection workflow.Agent copies the starter template
For a new app, the agent copies
assets/rokid-hello-world/ into the target workspace as the initial scaffold, then renames the package and application ID.