OpenClicky knows what’s on your screen. When you ask a question that involves a visible UI element — a button, a menu, a panel, a block of code — Clicky captures a screenshot, reasons about it, and points at the right thing using a small blue triangle that glides across your display. Screen capture is always on-demand and local: screenshots are saved to disk as task context and never uploaded to any OpenClicky server.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/jasonkneen/openclicky/llms.txt
Use this file to discover all available pages before exploring further.
How Screen Capture Works
OpenClicky uses ScreenCaptureKit (SCShareableContent, SCScreenshotManager) to capture the screen on demand. Capture is triggered only when:
- A voice or chat prompt is submitted and screen context is configured to be included
- The agent explicitly requests a screenshot to inform a computer-use step
- An external-control bridge call hits the
/screenshotendpoint
The Cursor Overlay
OpenClicky renders a native floating cursor overlay — a small blue triangle — that can zip to any point on your screen to draw attention to a UI element. The triangle is part of OpenClicky’s own window layer and never moves the macOS system pointer.Primary Cursor
The main OpenClicky triangle. Flies smoothly to a target coordinate, shows a caption label, holds for a configurable duration, then returns. Used for single-target pointing in voice responses and agent guidance.
Secondary Cursors
Temporary marker dots used for multi-point explanations, screen tours, and simultaneous highlights. Each can have its own accent colour and caption. They auto-dismiss after
durationMs or on a /clear call.The [POINT:x,y:label] Directive
When Claude generates a screen-aware response, it appends a [POINT:x,y:label] tag at the end of the text. OpenClicky parses this tag, strips it from the displayed text, and choreographs the triangle to animate to the specified coordinate.
| Format | Meaning |
|---|---|
[POINT:x,y:label] | Point at (x, y) on the primary display |
[POINT:x,y:label:screen2] | Point at (x, y) on a secondary display |
[POINT:none] | No pointing; answer is conceptual |
Coordinates are in the screenshot’s pixel space, not the macOS logical coordinate space. OpenClicky maps them to the correct screen position accounting for Retina scaling and display arrangement.
The [TYPE:x,y:label] Directive
The [TYPE:x,y:label] directive is the typing-action counterpart. Where [POINT] draws attention, [TYPE] signals that the agent intends to click at the target and begin typing. This is used in computer-use flows where the agent is both guiding and acting.
External Control Bridge: Cursor API
The local bridge athttp://127.0.0.1:32123 exposes REST endpoints for external agents and scripts to drive the overlay:
/caption (show a floating text label near a coordinate), /speak (TTS without entering voice mode), and /events (server-sent event stream for bridge activity).
Computer-Use Backends
When the agent needs to actually interact with an app — not just point at it — OpenClicky routes through one of two computer-use backends:Native CUA Swift
Embedded in OpenClicky. Uses
AXIsProcessTrusted() (Accessibility) for app/window enumeration, ScreenCaptureKit for window capture, CGEvent for keyboard input, and optionally a SkyLight private framework path for pid-directed key events. No external process required.Background Computer Use
Loopback runtime. Connects to a separately running
background-computer-use service over HTTP. Provides a richer window state API including accessibility tree nodes. Requires the runtime to be running and its manifest to be present in $TMPDIR/background-computer-use/runtime-manifest.json.OpenClickyComputerUseBackendID.
Permissions Required for Computer Use
Accessibility
AXIsProcessTrusted() must return true. Grant in System Settings → Privacy & Security → Accessibility. Required for window enumeration, app focus detection, and pid-directed keyboard input.Screen Recording
Required for
SCShareableContent and SCScreenshotManager. Grant in System Settings → Privacy & Security → Screen Recording.Window Targeting
Native CUA Swift finds the target window by:- Enumerating all on-screen windows via
CGWindowListCopyWindowInfo - Filtering to windows with
layer == 0, width > 100 pt, height > 80 pt - Excluding OpenClicky’s own bundle identifier
- Preferring the frontmost app’s windows by z-index
OpenClickyComputerUseWindowInfo contains the window ID, PID, owner name, title, bounds, and z-index — all passed to the agent as context so it can reason about which window it is operating on.
Privacy: Screenshots Are Local
Screenshots taken by OpenClicky are saved to local disk as task context files. They are never transmitted to any OpenClicky-hosted server. When attached to an agent prompt, they are sent to the AI provider (Anthropic’s Claude API) as part of that specific request — subject to the provider’s standard data handling policy — but are not stored or logged by OpenClicky beyond the local session.