Agent Mode: Autonomous Task Execution in OpenClicky

Agent Mode is the autonomous, multi-step task engine built into OpenClicky. While the voice loop is designed for quick back-and-forth answers, Agent Mode is for work that requires a plan, sequential tool calls, file changes, shell commands, or coordination across apps. You describe the task; Clicky breaks it down, executes it step by step, and reports back — while keeping the voice loop available for follow-up questions the whole time.

What Agent Mode Can Do

Shell & File Work

Run shell commands, read and write files, scaffold repos, run builds, generate reports, PDFs, DOCX, and spreadsheets — all inside your configured projects root.

Skills & Bundled Tools

Use bundled skills for Google Workspace (via gogcli), frontend dev, browser automation (Playwright or chrome-devtools), and screen control. Learned skills for your own recurring workflows.

App & Browser Integration

GitHub via Composio MCP, Google Workspace via local gog CLI, web search, and browser tab control. Prefers direct API routes before falling back to browser automation.

Computer Use

When no structured route exists, Agent Mode can use OpenClicky’s native computer-use path to click, type, and navigate macOS apps directly — the last-mile fallback.

Invoking Agent Mode

Open the notch panel

Click the OpenClicky menu-bar icon or double-tap Shift to open the compact notch panel.

Switch to Agents tab

Select the Agents tab in the panel to see running sessions and the agent prompt composer.

Type or speak your task

Enter a task description. For complex work, be specific: what the goal is, what files or apps are involved, and any constraints. Voice input works here too — hold the push-to-talk shortcut.

Submit

Press Return (or Shift+Return to add a new line). OpenClicky creates a new CodexAgentSession, opens the HUD window, and begins work.

You can also submit an agent task from the quick-prompt bar on the Home tab by prefixing your message with @agent or by selecting Agent Mode from the prompt mode picker.

The HUD Window

When an agent task starts, OpenClicky opens the Agent HUD — a dedicated floating window managed by CodexHUDWindowManager. The HUD is separate from the main notch panel so you can keep watching agent progress without the panel getting in the way. The HUD shows:

Transcript entries — the full turn-by-turn conversation between you and the agent, including intermediate command output
Progress stage — one of Starting, Planning, Executing, Composing reply, Completed, or Stopped
Activity status lines — live rolling updates of what the agent is currently doing
Stop button — cancels the current turn immediately

The HUD is a resizable NSPanel (default 980 × 560 pt, minimum 720 × 452 pt). Close it with the traffic-light or Escape — it hides but stays alive so the next show() is instant.

// HUD panel dimensions from CodexHUDWindowManager
enum OpenClickyHUDLayout {
    static let width: CGFloat = 980
    static let height: CGFloat = 560
    static let minimumWidth: CGFloat = 720
    static let minimumHeight: CGFloat = 452
}

What the Agent Reads at Task Start

At the beginning of every task, the agent reads three files from Codex home before doing any work:

SOUL.md

Defines OpenClicky’s persona, autonomy level, memory behaviour, and quality bar. The agent treats this as its operating identity.

memory.md

Durable user and project context accumulated across sessions. The agent treats this as ground truth for preferences, project facts, and prior task outcomes.

OpenClickyRuntimeMap.md

Maps exact local paths for logs, memory, skills, widget state, sessions, config, and review comments. The agent consults this whenever it needs to locate OpenClicky’s own storage.

The agent is instructed never to claim it “cannot remember” outside the current conversation. If memory is needed it reads memory.md; if new durable context is learned during a task it updates memory.md before finishing.

The `<NEXT_ACTIONS>` Block

Every agent response ends with a machine-readable <NEXT_ACTIONS> block. This is not shown as prose in the final answer — it is overlay metadata that OpenClicky parses to populate the suggested follow-up buttons on the HUD and notch panel.

<NEXT_ACTIONS>
- Review the Swift diff
- Run the test suite
</NEXT_ACTIONS>

Rules for <NEXT_ACTIONS>:

One or two bullets only.
Each bullet is under ~40 characters.
Self-contained and immediately executable without asking the user for extra input.
Concrete actions like “Open the first email”, “Test the cursor label”, or “Summarise the page”.
Weak or vague suggestions are omitted rather than padded.
Nothing appears after the closing </NEXT_ACTIONS> tag.

Routing Logic

The agent follows a strict preference order when deciding how to complete a task:

Direct answers

Simple factual questions, explanations, and math are answered immediately without any tool calls.

Structured tool routes

Web search for fresh facts; image gallery for visual results; local shell and file tools for code and document work; bundled skills for known integration patterns.

Integration routes

GitHub via Composio MCP; Google Workspace via gog CLI; browser tools (chrome-devtools for logged-in sessions, playwright for isolated automation).

Computer use (last resort)

OpenClicky’s native CUA Swift or Background Computer Use backend for Mac GUI actions that have no structured route. The agent explicitly labels computer-use steps as such in progress notes.

The agent never uses xcodebuild from the terminal for building the OpenClicky app itself. It uses Xcode for builds and swiftc -parse for lightweight syntax checks to avoid macOS TCC permission loops caused by unsigned or throwaway build products.

Background vs Foreground Work

OpenClicky can run multiple agent sessions simultaneously. Each CodexAgentSession is independent and maintains its own transcript, status, and file leases. The file lease coordinator (OpenClickyAgentFileLeaseCoordinator) prevents two sessions from writing to the same path at the same time. When a session tries to claim a path already held by another, it either waits (with a configurable timeout) or proceeds with the available paths and reports the conflict. Sessions are classified in the panel by their filter:

Active — currently running or recently started
All — all sessions including completed ones not yet archived

Completed sessions can be archived (hidden from the default view) individually or in bulk.

Specialist Agents

Beyond the default Clicky agent, you can define specialist agents with their own system context, instructions, and memory. Each agent has a unique agentSlug and can be selected in the agent prompt with @agent-name, or routed to automatically by automations.

How specialist agents work

A specialist agent is a CodexAgentSession launched with a prependedSystemContext string — the agent’s own SOUL.md, instructions.md, and/or specialist memory prepended to the standard developer instructions. The session also stores the agent’s specialistAgentSlug so automations and routing logic can reference it.When you type @ in the agent prompt bar, an autocomplete list appears showing available specialist agents. Selecting one creates a new session pre-configured for that agent’s context.Specialist agents are stored in OpenClickyAgentStore and persisted in ~/Library/Application Support/OpenClicky/.

Session Status Reference

Status	Meaning
`starting`	Agent runtime is initialising
`running`	Active tool call or LLM generation in progress
`ready`	Last turn completed; waiting for next prompt
`stopped`	Session terminated (user stopped or natural end)
`failed(String)`	Error — error message included in status

Get Started

Core Features

Skills

Integrations

Agent Mode: Autonomous Task Execution in OpenClicky

What Agent Mode Can Do

Shell & File Work

Skills & Bundled Tools

App & Browser Integration

Computer Use

Invoking Agent Mode

The HUD Window

What the Agent Reads at Task Start

SOUL.md

memory.md

OpenClickyRuntimeMap.md

The `<NEXT_ACTIONS>` Block

Routing Logic

Background vs Foreground Work

Specialist Agents

Session Status Reference

Build docs developers (and LLMs) love

Get Started

Core Features

Skills

Integrations

Documentation Index

​What Agent Mode Can Do

Shell & File Work

Skills & Bundled Tools

App & Browser Integration

Computer Use

​Invoking Agent Mode

​The HUD Window

​What the Agent Reads at Task Start

SOUL.md

memory.md

OpenClickyRuntimeMap.md

​The <NEXT_ACTIONS> Block

​Routing Logic

​Background vs Foreground Work

​Specialist Agents

​Session Status Reference

Build docs developers (and LLMs) love

What Agent Mode Can Do

Invoking Agent Mode

The HUD Window

What the Agent Reads at Task Start

The `<NEXT_ACTIONS>` Block

Routing Logic

Background vs Foreground Work

Specialist Agents

Session Status Reference