Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jasonkneen/openclicky/llms.txt

Use this file to discover all available pages before exploring further.

OpenClicky knows what’s on your screen. When you ask a question that involves a visible UI element — a button, a menu, a panel, a block of code — Clicky captures a screenshot, reasons about it, and points at the right thing using a small blue triangle that glides across your display. Screen capture is always on-demand and local: screenshots are saved to disk as task context and never uploaded to any OpenClicky server.

How Screen Capture Works

OpenClicky uses ScreenCaptureKit (SCShareableContent, SCScreenshotManager) to capture the screen on demand. Capture is triggered only when:
  • A voice or chat prompt is submitted and screen context is configured to be included
  • The agent explicitly requests a screenshot to inform a computer-use step
  • An external-control bridge call hits the /screenshot endpoint
Capture targets the focused window — the frontmost non-OpenClicky window — unless a full-screen or multi-display capture is requested. Screenshots are encoded as JPEG at 0.82 quality and scaled to a maximum dimension of 1280 pixels to balance fidelity with token cost.
// From OpenClickyComputerUseWindowCaptureUtility
let configuration = SCStreamConfiguration()
let maxDimension = 1280
// Aspect-ratio-preserving scale to maxDimension
configuration.width = ...
configuration.height = ...

let filter = SCContentFilter(desktopIndependentWindow: screenCaptureWindow)
let cgImage = try await SCScreenshotManager.captureImage(
    contentFilter: filter,
    configuration: configuration
)
// Encoded as JPEG at 0.82 compression
guard let imageData = NSBitmapImageRep(cgImage: cgImage)
    .representation(using: .jpeg, properties: [.compressionFactor: 0.82])
The captured image is attached to the prompt as a local file reference. The agent prompt prefix explicitly instructs the model to treat screenshot paths as visual context material — not as files the user wants to retrieve.

The Cursor Overlay

OpenClicky renders a native floating cursor overlay — a small blue triangle — that can zip to any point on your screen to draw attention to a UI element. The triangle is part of OpenClicky’s own window layer and never moves the macOS system pointer.

Primary Cursor

The main OpenClicky triangle. Flies smoothly to a target coordinate, shows a caption label, holds for a configurable duration, then returns. Used for single-target pointing in voice responses and agent guidance.

Secondary Cursors

Temporary marker dots used for multi-point explanations, screen tours, and simultaneous highlights. Each can have its own accent colour and caption. They auto-dismiss after durationMs or on a /clear call.

The [POINT:x,y:label] Directive

When Claude generates a screen-aware response, it appends a [POINT:x,y:label] tag at the end of the text. OpenClicky parses this tag, strips it from the displayed text, and choreographs the triangle to animate to the specified coordinate.
// Claude response with pointing directive
"You'll want the color inspector — it's in the top right of the toolbar. 
Click it to get all the color wheels and curves.
[POINT:1100,42:color inspector]"
Coordinate format:
FormatMeaning
[POINT:x,y:label]Point at (x, y) on the primary display
[POINT:x,y:label:screen2]Point at (x, y) on a secondary display
[POINT:none]No pointing; answer is conceptual
Coordinates are in the screenshot’s pixel space, not the macOS logical coordinate space. OpenClicky maps them to the correct screen position accounting for Retina scaling and display arrangement.
The model is prompted to be proactive about pointing: if the user’s question has anything to do with a visible UI element, file, button, menu, panel, code on screen, or the words “this”, “that”, or “here”, the model should usually point rather than waiting to be asked.

The [TYPE:x,y:label] Directive

The [TYPE:x,y:label] directive is the typing-action counterpart. Where [POINT] draws attention, [TYPE] signals that the agent intends to click at the target and begin typing. This is used in computer-use flows where the agent is both guiding and acting.

External Control Bridge: Cursor API

The local bridge at http://127.0.0.1:32123 exposes REST endpoints for external agents and scripts to drive the overlay:
curl -s -X POST http://127.0.0.1:32123/cursor \
  -H 'Content-Type: application/json' \
  -d '{"x":640,"y":520,"caption":"Click this menu","durationMs":4500}'
The bridge also supports /caption (show a floating text label near a coordinate), /speak (TTS without entering voice mode), and /events (server-sent event stream for bridge activity).

Computer-Use Backends

When the agent needs to actually interact with an app — not just point at it — OpenClicky routes through one of two computer-use backends:

Native CUA Swift

Embedded in OpenClicky. Uses AXIsProcessTrusted() (Accessibility) for app/window enumeration, ScreenCaptureKit for window capture, CGEvent for keyboard input, and optionally a SkyLight private framework path for pid-directed key events. No external process required.

Background Computer Use

Loopback runtime. Connects to a separately running background-computer-use service over HTTP. Provides a richer window state API including accessibility tree nodes. Requires the runtime to be running and its manifest to be present in $TMPDIR/background-computer-use/runtime-manifest.json.
The active backend is selected in Settings → Computer Use and reflected in OpenClickyComputerUseBackendID.

Permissions Required for Computer Use

1

Accessibility

AXIsProcessTrusted() must return true. Grant in System Settings → Privacy & Security → Accessibility. Required for window enumeration, app focus detection, and pid-directed keyboard input.
2

Screen Recording

Required for SCShareableContent and SCScreenshotManager. Grant in System Settings → Privacy & Security → Screen Recording.
3

SkyLight keyboard path (optional)

When available, OpenClicky uses a private SkyLight.framework symbol (SLEventPostToPid) for more reliable pid-directed key events. This path is detected automatically; the app falls back to the public CGEvent.postToPid path when it is not available.

Window Targeting

Native CUA Swift finds the target window by:
  1. Enumerating all on-screen windows via CGWindowListCopyWindowInfo
  2. Filtering to windows with layer == 0, width > 100 pt, height > 80 pt
  3. Excluding OpenClicky’s own bundle identifier
  4. Preferring the frontmost app’s windows by z-index
The resulting OpenClickyComputerUseWindowInfo contains the window ID, PID, owner name, title, bounds, and z-index — all passed to the agent as context so it can reason about which window it is operating on.

Privacy: Screenshots Are Local

Screenshots taken by OpenClicky are saved to local disk as task context files. They are never transmitted to any OpenClicky-hosted server. When attached to an agent prompt, they are sent to the AI provider (Anthropic’s Claude API) as part of that specific request — subject to the provider’s standard data handling policy — but are not stored or logged by OpenClicky beyond the local session.
All screen capture happens on demand in response to a user action. OpenClicky does not run a background screen recording loop or take periodic screenshots.

Build docs developers (and LLMs) love