Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jasonkneen/openclicky/llms.txt

Use this file to discover all available pages before exploring further.

OpenClicky ships a local-only HTTP and SSE bridge that lets agents, scripts, and trusted local apps send visual guidance commands directly to the overlay layer — without entering the normal voice, conversation, or agent state machine. The bridge is the foundation for every programmatic interaction with OpenClicky’s on-screen affordances: pointing, captioning, speech, and screenshots all route through it.

What the bridge is and why it exists

When an agent or external tool needs to show the user something on screen — “click this button”, “look at this panel”, “here are three areas to compare” — it should not have to submit a conversation prompt or start a new Clicky session to do so. The bridge provides a direct, lightweight channel for exactly those actions. The bridge is intentionally non-invasive by design. It operates as a side channel into OpenClicky’s overlay and TTS subsystems. Nothing sent to it starts a new dictation session, modifies the conversation state, or creates or destroys agent sessions. The main Clicky session is completely unaffected.

Address

The bridge always listens on the loopback interface:
http://127.0.0.1:32123
It binds only to 127.0.0.1 — it is not reachable from other machines on the network.
Most bridge endpoints require a bridge token. Configure OPENCLICKY_BRIDGE_TOKEN in your secrets file or in OpenClicky Settings. Pass it as the x-openclicky-token request header or as a standard Authorization: Bearer <token> header. The /health endpoint does not require a token.

What the bridge can do

  • Drive the overlay — point the primary OpenClicky cursor at any macOS screen coordinate, place simultaneous secondary markers, or show floating captions.
  • Capture screenshots — request JPEG screenshots of all displays or just the focused window, receiving local file paths and AppKit-coordinate display frames back.
  • Speak through TTS — send a short spoken instruction through OpenClicky’s TTS without touching push-to-talk or voice-response mode.
  • Clear overlay elements — remove any bridge-created cursors and captions in one call.
  • Stream bridge events — subscribe to a server-sent event stream to receive ready and command acknowledgements.

What the bridge cannot do

The bridge deliberately has no access to the following:
  • Starting or stopping push-to-talk dictation
  • Submitting prompts into the active Clicky conversation
  • Creating or spawning new agent or worker sessions
  • Reading or mutating any part of the conversation state, memory, or context
These restrictions are intentional. The bridge is a display and feedback channel, not a conversation interface.

Primary vs secondary cursor model

Understanding how OpenClicky’s two cursor modes differ is important for building correct interactions. Primary cursor (mode: "primary", the default) uses OpenClicky’s native smooth pointing choreography — the same animation triggered when a user says “show me the Apple menu” in voice mode. The triangular OpenClicky cursor zips from its current resting position to the target coordinate, displays the caption, and then flies back. Critically, it does not warp the real macOS system pointer and does not draw a duplicate cursor icon at the target. Use the primary cursor when you want the same experience that Clicky’s built-in guidance produces. Secondary cursors (mode: "secondary" on /cursor, or all cursors created by /cursors) are explicit temporary colored markers. They appear at the given coordinates, show their captions, and disappear automatically after durationMs milliseconds or when /clear is called. Use secondary cursors for multi-point explanations, side-by-side comparisons, or screen tour overlays where you want several markers visible simultaneously.
For a screen tour, combine both modes: place simultaneous secondary markers with /cursors for context, then step through each important item using the primary /cursor choreography for emphasis.

Checking bridge status

Send a GET /health request to confirm the bridge is running. No token is required. The response includes the port number, transport type, and the list of available tool names.
curl -s http://127.0.0.1:32123/health
A healthy response looks like:
{
  "ok": true,
  "name": "OpenClicky External Control Bridge",
  "port": 32123,
  "transport": "local-http+sse",
  "bridgeTokenRequired": true,
  "bridgeTokenConfigured": true,
  "tools": ["openclicky_point", "openclicky_point_many", "show_cursor", "show_cursors", "show_caption", "screenshot", "clear", "speak", "notify"],
  "multiToolEndpoints": ["/mcp/calls", "/tools/calls"]
}
The test script scripts/test-external-control-bridge.sh exercises the bridge end-to-end: Swift parse/typecheck checks, health, MCP descriptors, screenshot capture, captions, secondary cursors, SSE events, and primary cursor choreography verification.

SSE event stream

Subscribe to GET /events to receive server-sent events. The connection stays open and delivers:
  • An event: ready event immediately on connect, confirming the bridge is live.
  • An event: command event after every successful command processed by the bridge, carrying ok, path, and (for batch calls) count.
curl -N http://127.0.0.1:32123/events
Example output:
event: ready
data: {"ok":true,"port":32123}

event: command
data: {"ok":true,"path":"/cursor"}
SSE connections require a valid bridge token like all other authenticated endpoints.

Bundled skills

Three bundled agent skills in AppResources/OpenClicky/OpenClickyBundledSkills/ use the bridge directly:
SkillPurpose
openclicky-screen-controlQuick point, caption, screenshot, speak, and clear commands for immediate visual guidance when a user asks “show me where you mean” or “point to it”.
openclicky-screen-tourRecordable visual tours with multiple simultaneous markers, area-focused overlays, primary cursor choreography, screenshots, captions, and TTS narration.
google-workspace-gogcliLocal Google Workspace access through gogcli. Not a visual skill, but it uses the bridge for cursor and caption feedback during Workspace interactions.

Use cases

Visual tutorials — When a user asks how to do something in macOS or an app, an agent can capture a screenshot, identify the relevant control, and call /cursor with a short caption. No new agent session required; the answer is immediate and on-screen. Multi-point screen tours — An agent or skill can call /cursors with a list of UI points and captions, placing several markers simultaneously. This is ideal for orientation overviews (“here is the menu bar, the editor, the sidebar, and the logs panel”) before zooming into each item with the primary cursor. External agent coordination — A Cursor extension, Claude Desktop integration, or any local MCP-aware agent runtime can call /mcp/call or /mcp/calls to drive OpenClicky’s overlay as part of a larger workflow, using OpenClicky purely as a display layer while the agent does its own reasoning elsewhere. Recordable demos — The screen-tour skill uses a compact coordinate region derived from the current display’s visibleFrame, keeps markers small, and uses short captions — all optimized for screen recordings where the overlay should not occlude important content.

Build docs developers (and LLMs) love