Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/RealComputer/GlassKit/llms.txt

Use this file to discover all available pages before exploring further.

The GlassKit agent skill packages smart-glasses context — device constraints, reference patterns, and a starter template — into a format that AI coding agents can read and act on. This page explains what the skill contains, why it matters for glasses development, which agents support it, how to install and update it, and example prompts to get started.

Why Coding Agents Need It

Smart-glasses apps have unique aspects that general-purpose coding agents are not prepared for out of the box:
  • HUD constraints — Rokid Glasses use a monochrome 480×640 portrait display. Black is transparent; white is green. There is no touchscreen.
  • Input model — The only physical controls are a temple touchpad (tap, double-tap, swipe forward, swipe backward). Voice commands are offline and fixed-phrase.
  • Camera and microphone — Device-specific capture constraints, codec preferences, and HAL quirks affect WebRTC streaming behavior.
  • Battery and performance — The glasses have less CPU and RAM than phones. Implementations must be efficient.
  • Wearer UX — The wearer cannot look at a phone while using the app. Spoken guidance, minimal HUD text, and low-latency feedback matter more than they do in phone apps.
Without this context an agent will generate code that works on a phone but fails or feels wrong on glasses.

Which Agents It Works With

The GlassKit skill is distributed through the Agent Skills CLI. Any agent that supports that CLI can use it, including:

Installing the Skill

Run the install command once in your project directory or globally:
npx skills add RealComputer/GlassKit

Updating the Skill

When GlassKit ships new references or updates existing ones, pull the latest version with:
npx skills update glasskit

What the Skill Includes

After installation the agent can read three things:

SKILL.md

Top-level orientation: Rokid hardware basics, display and input constraints, workflow steps, and a reference index. The agent reads this first.

references/

Detailed reference docs covering Rokid setup, inputs, voice commands, WebRTC, proactive perception, OpenAI Realtime, and object detection.

assets/rokid-hello-world/

A minimal Rokid Glasses starter app the agent can copy into your workspace as a scaffold. Includes HUD layout and touchpad navigation patterns.
The full reference set the agent reads:
ReferenceWhat it covers
rokid-setup.mdHardware, Wi-Fi/ADB connection, common commands, phone/emulator setup
rokid-inputs.mdTouchpad handling, camera access, microphone access
vosk-voice-commands.mdOffline command-word recognition setup and implementation
rokid-webrtc.mdWebRTC sessions, audio/video tracks, SDP signaling, data channels, ICE/TURN
proactive-perception-pattern.mdContinuous observation pattern for workflow-driven apps
openai-realtime.mdWebRTC media brokering, VAD, backend-gated turns, sideband events
object-detection.mdBackend inference, normalized events, RF-DETR, realtime model augmentation

Example Prompts

Once the skill is installed, use prompts like these with your agent:
Create a starter Rokid Glasses app using the glasskit skill.

What Happens When the Agent Uses the Skill

1

Agent reads SKILL.md

The agent loads the top-level orientation doc to understand Rokid hardware, display constraints, touchpad input model, and the available reference set.
2

Agent reads relevant references

Based on the task, the agent reads the specific references it needs — for example rokid-webrtc.md and openai-realtime.md for a voice assistant, or object-detection.md for a detection workflow.
3

Agent copies the starter template

For a new app, the agent copies assets/rokid-hello-world/ into the target workspace as the initial scaffold, then renames the package and application ID.
4

Agent implements with constraints in mind

The agent writes Android and backend code that accounts for the monochrome HUD, touchpad navigation, camera HAL quirks, codec preferences, and wearer UX patterns from the references.
If the agent produces code that looks like a phone app rather than a glasses app — touchscreen gestures, colorful UI, or missing touchpad navigation — remind it to re-read the SKILL.md orientation and the rokid-inputs.md reference.

Questions and Contributions

Open an issue in the GlassKit repository or ask in the Discord server for questions about the skill or to contribute new references and examples.

Build docs developers (and LLMs) love