Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jasonkneen/openclicky/llms.txt

Use this file to discover all available pages before exploring further.

OpenClicky ships a native macOS computer-use capability that lets Clicky operate real GUI applications in the background — Finder, browsers, Numbers, Calendar, and anything else accessible through macOS Accessibility APIs — without stealing focus, warping the system pointer, or pulling you away from whatever you’re typing. Computer use is the last-mile fallback: Clicky will always prefer a structured integration route (GitHub via Composio, Google Workspace via gogcli, etc.) before reaching for GUI automation. When no structured route exists, computer use bridges the gap.

When to Use Computer Use

Computer use is a last resort. Before triggering it, Clicky checks whether the task can be handled through a direct API, MCP integration, shell command, or other structured route. For integration-capable apps like Gmail, Slack, GitHub, or Calendar, the structured route always takes priority unless you explicitly ask Clicky to “click”, “type in”, or otherwise operate the visible UI.
Computer use is appropriate when:
  • The target app has no API or MCP connector and the only path is its GUI.
  • You’ve explicitly asked Clicky to operate a specific window or control.
  • You’re asking for something that genuinely requires clicking or typing in a native macOS app.
It is not a substitute for a disconnected or expired integration connector. If a Composio connector is missing or expired, Clicky will ask you to reconnect it from OpenClicky Settings → Integrations rather than silently falling back to GUI automation on private account data.

Two Backends

OpenClicky exposes two computer-use backends, defined in OpenClickyComputerUseModels.swift:
nonisolated enum OpenClickyComputerUseBackendID: String, CaseIterable, Identifiable, Sendable {
    case nativeSwift = "native_swift"
    case backgroundComputerUse = "background_computer_use"

    var label: String {
        switch self {
        case .nativeSwift:
            return "Native CUA Swift"
        case .backgroundComputerUse:
            return "Background Computer Use"
        }
    }

    var executorID: String {
        switch self {
        case .nativeSwift:
            return "native_cua"
        case .backgroundComputerUse:
            return "background_computer_use"
        }
    }

    static let fallback: OpenClickyComputerUseBackendID = .nativeSwift

    static func resolving(_ rawValue: String?) -> OpenClickyComputerUseBackendID {
        guard let rawValue,
              let backend = OpenClickyComputerUseBackendID(rawValue: rawValue) else {
            return fallback
        }
        return backend
    }
}

Native CUA Swift (native_cua)

The primary backend. Embedded directly in OpenClicky.app as OpenClickyNativeComputerUseController, it drives macOS apps through Accessibility APIs and ScreenCaptureKit — no external helper binary required. Because it runs inside OpenClicky.app, macOS attributes both Accessibility and Screen Recording usage to OpenClicky itself (not to a separate helper or CLI tool). The executor ID for this backend is native_cua.

Background Computer Use (background_computer_use)

A loopback runtime backed by OpenClickyBackgroundComputerUseController. It communicates with a local background-computer-use runtime over HTTP (checking for a runtime manifest at /tmp/background-computer-use/runtime-manifest.json). The executor ID is background_computer_use. Use this backend for tasks where you want to keep the automation entirely offscreen while you continue working in the foreground. resolving(_:) is the factory method that selects the right backend from a raw string value, falling back to native_cua when the value is absent or unrecognized.

The cua-driver Skill and MCP Tools

Clicky operates apps through the computer-use MCP server backed by OpenClickyComputerUseRuntime. The bundled cua-driver skill is the instruction surface — it teaches Clicky which tools to call and in what order. You never call cua-driver as a CLI; you call the MCP tools directly.

Available MCP tools

ToolPurpose
launch_appLaunch or attach to an app by bundle ID. Idempotent — safe to call on a running app. Returns pid and a windows array.
list_windowsEnumerate a pid’s windows with window_id, title, bounds, z-index, and Space info.
get_window_stateSnapshot a window’s AX tree (tree_markdown) and screenshot. Populates the element-index cache for the (pid, window_id) pair.
clickAX-dispatch a left click to an element by element_index.
right_clickAX-dispatch a right click / context menu to an element by element_index.
set_valueWrite a value directly to a text field, slider, or stepper. Preferred for keyboard-commit workarounds on minimized windows.
type_textType text into a focused element via AXSelectedText write with automatic CGEvent fallback.
press_keySend a key to a pid’s current focus, optionally setting AX focus first via element_index.
hotkeyPost a modifier-key combo (e.g. ["cmd","c"]) to a pid via CGEvent.postToPid.
scrollSynthesize scroll events (PageUp/PageDown/arrows) via SLEventPostToPid.
screenshotCapture a raw PNG of a window (no AX walk).
check_permissionsCheck Accessibility and Screen Recording grant status for OpenClicky.app.

Canonical multi-step workflow

launch_app({"bundle_id": "com.apple.calculator"})
# -> {pid: 844, windows: [{window_id: 10725, ...}]}
get_window_state({"pid": 844, "window_id": 10725})
click({"pid": 844, "window_id": 10725, "element_index": 14})
get_window_state({"pid": 844, "window_id": 10725})

The Snapshot-Before-Action Invariant

Every action must be bracketed by get_window_state(pid, window_id) — before and after:
  • Before: The pre-action snapshot populates the element-index cache for that (pid, window_id) pair. Element indices from a previous turn, or from a different window of the same app, are stale and will fail with No cached AX state. Skip this snapshot and element-indexed actions will not work.
  • After: The post-action snapshot verifies the action actually landed. Without it, Clicky cannot distinguish a successful click from a silent no-op. If the AX tree is unchanged after an action, the action likely failed — Clicky will say so rather than reporting false success.
The snapshot-before-action invariant is not optional. Skipping it is the single most common failure mode in GUI automation agents — the agent reports “done” while the action was silently dropped.

Background Mode: No Focus Stealing

The entire point of OpenClicky’s computer-use implementation is that the user’s frontmost app must not change. You should be able to keep typing in your editor while Clicky drives another app or browser window in the background. This means Clicky will never use:
  • open -a <App> or any form of the open CLI (routes through LaunchServices, always activates)
  • osascript 'tell application "X" to activate' or any AppleScript that activates a target
  • cliclick (moves the real system pointer)
  • CGEventPost with cghidEventTap over another app’s window
Instead, Clicky uses launch_app with a built-in FocusRestoreGuard that intercepts NSApp.activate(ignoringOtherApps:) calls the target makes during launch and restores the previous frontmost app immediately afterward.

Browser Automation

For browser tasks, Clicky resolves the user’s default HTTPS browser by bundle ID and opens the target URL in a new background window:
launch_app({
  "bundle_id": "<default_browser_bundle_id>",
  "urls": ["https://example.com"]
})
This preserves your current tabs and uses your normal logged-in browser profile. Clicky will never pass --user-data-dir or other isolated-profile flags — those would log you out of the accounts you expect to use.
Do not use ⌘L (omnibox focus), tab-switching shortcuts (⌘1⌘9, ⌘], ⌘[), or set_value on the omnibox for navigation. These either steal focus or silently fail for URL commit in a backgrounded browser. Always use launch_app with a urls array for browser navigation.

Integration Routes First

Before reaching for computer use, Clicky follows this routing order:
  1. Direct answers — for simple questions
  2. Structured integrations — GitHub via Composio MCP, Google Workspace via gogcli
  3. Shell and file tools — for local work inside the configured projects root
  4. Computer use — only as the last-mile fallback for native Mac or browser actions with no structured route
If an integration-capable app (LinkedIn, Gmail, Slack, Notion, Linear, GitHub, Calendar, Drive, Docs, Sheets) is visible on screen, Clicky still prefers the structured MCP/Composio route unless you explicitly ask it to click or operate the visible UI.

Permissions

Computer use requires two macOS permissions granted to OpenClicky.app (not to a separate CLI tool):
  • Accessibility — required for reading AX trees and dispatching element_index actions
  • Screen Recording — required for get_window_state window capture (used in every snapshot)
Check permission status at any time:
check_permissions({"prompt": false})
If either permission is missing, Clicky will stop and ask you to grant it in System Settings → Privacy & Security. It will not attempt workarounds or shell out through a different path.
Permissions are attributed to OpenClicky.app because the computer-use runtime runs as a bundled helper inside the app rather than as a standalone binary. Granting them once during onboarding covers all computer-use operations for the lifetime of that OpenClicky install.

Common Error Reference

ErrorMeaningFix
No cached AX state for pid X window_id Wget_window_state was skipped, or a different window_id was used for the action than for the snapshot.Call get_window_state({pid: X, window_id: W}) first, using the same window_id you intend to act against.
Invalid element_index N for pid X window_id WIndex is stale or out of range.Re-run get_window_state with the same window_id and pick a fresh index from the new tree.
AX action AXPress failedThe element doesn’t support AXPress.Try show_menu, confirm, cancel, or pick as the action.
System-alert beep on press_key with no visible changeThe target window is minimized; Return/Space/Tab commits don’t establish real renderer focus.Use set_value to write the field value directly, or AX-click a Go/Submit button instead.
Accessibility permission not grantedTCC not granted to OpenClicky.app.Grant in System Settings → Privacy & Security → Accessibility.
Screen Recording permission not grantedTCC not granted to OpenClicky.app.Grant in System Settings → Privacy & Security → Screen Recording.

Build docs developers (and LLMs) love