Quickstart

Get up and running with Agent Browser quickly. This guide walks you through the core workflow: installing, navigating to a page, taking a snapshot, and interacting with elements using refs.

Prerequisites

Node.js 18+ installed
Basic command line knowledge

Install and Setup

Install Agent Browser

Install globally for best performance:

npm install -g agent-browser

Download Chromium

Agent Browser needs Chromium to run:

agent-browser install

On Linux, you may need system dependencies:

agent-browser install --with-deps

Verify Installation

Test that everything works:

agent-browser open example.com
agent-browser close

You should see the browser launch and navigate to example.com.

Your First Automation

Let’s automate a simple form fill workflow using the snapshot-ref pattern.

Navigate to a Page

Open a page with a form:

agent-browser open https://example.com/login

The browser will launch (headless by default) and navigate to the URL.

Take a Snapshot

Get the accessibility tree with element refs:

agent-browser snapshot -i

The -i flag shows only interactive elements (buttons, inputs, links). Output:

- textbox "Email" [ref=e1]
- textbox "Password" [ref=e2] [type=password]
- button "Sign In" [ref=e3]
- link "Forgot password?" [ref=e4]

Each element gets a unique @e{N} ref that you can use to interact with it.

Fill the Form

Use refs to fill inputs and click buttons:

agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3

Refs are deterministic - @e1 always refers to the same element from the snapshot.

Get Results

Wait for navigation and check the result:

agent-browser wait --load networkidle
agent-browser get url
agent-browser snapshot -i

This shows the current URL and the new page structure.

Clean Up

Close the browser when done:

agent-browser close

Alternative Selector Methods

You can also use traditional selectors alongside refs:

agent-browser click "#submit-button"
agent-browser fill "#email" "test@example.com"
agent-browser hover ".dropdown-menu"

Refs are recommended for AI agents because they’re deterministic and don’t require DOM knowledge. CSS selectors are useful when you know the page structure.

Command Chaining

Chain multiple commands for efficiency:

agent-browser open example.com && \
  agent-browser snapshot -i && \
  agent-browser fill @e1 "value" && \
  agent-browser click @e2

The browser daemon persists between commands, so chaining is fast and safe.

JSON Output for AI Agents

Use --json for machine-readable output:

agent-browser snapshot -i --json

Returns structured JSON with the accessibility tree and refs:

{
  "success": true,
  "data": {
    "snapshot": "- textbox \"Email\" [ref=e1]\n- button \"Submit\" [ref=e2]",
    "refs": {
      "e1": {
        "role": "textbox",
        "name": "Email",
        "selector": "input[type=\"email\"]"
      },
      "e2": {
        "role": "button",
        "name": "Submit"
      }
    }
  }
}

Sessions for Parallel Browsers

Run multiple isolated browser instances:

# Terminal 1 - First agent
agent-browser --session agent1 open site-a.com
agent-browser --session agent1 snapshot -i

# Terminal 2 - Second agent
agent-browser --session agent2 open site-b.com
agent-browser --session agent2 snapshot -i

Each session has its own browser, cookies, and state.

Headed Mode for Debugging

See what the browser is doing:

agent-browser --headed open example.com

The browser window will be visible instead of headless.

Common Patterns

Wait for Elements

agent-browser wait "#content"           # Wait for element
agent-browser wait 2000                 # Wait 2 seconds
agent-browser wait --text "Welcome"     # Wait for text
agent-browser wait --load networkidle   # Wait for network

Get Information

agent-browser get text @e1              # Get element text
agent-browser get value @e2             # Get input value
agent-browser get attr @e3 "href"       # Get attribute
agent-browser get title                 # Get page title
agent-browser get url                   # Get current URL

Navigate

agent-browser back                      # Go back
agent-browser forward                   # Go forward
agent-browser reload                    # Reload page

Screenshots

agent-browser screenshot page.png       # Screenshot
agent-browser screenshot --full full.png # Full page

Annotated Screenshots

For multimodal AI models that can see images:

agent-browser screenshot --annotate

This overlays numbered labels on interactive elements in the screenshot. The labels correspond to refs ([1] = @e1, [2] = @e2), so you can use the same refs after viewing the annotated screenshot.

Next Steps

Core Concepts

Learn about the Rust CLI + Node.js daemon architecture

All Commands

Browse the complete command reference

Security Features

Auth vault, domain allowlist, action policies

AI Agent Integration

Use with Claude Code, Cursor, and other AI assistants

Get Started

Core Concepts

Commands

Security

Advanced

Integrations

Guides

Prerequisites

Install and Setup

Your First Automation

Alternative Selector Methods

Command Chaining

JSON Output for AI Agents

Sessions for Parallel Browsers

Headed Mode for Debugging

Common Patterns

Wait for Elements

Get Information

Navigate

Screenshots

Annotated Screenshots

Next Steps

Core Concepts

All Commands

Security Features

AI Agent Integration

Build docs developers (and LLMs) love

Get Started

Core Concepts

Commands

Security

Advanced

Integrations

Guides

Documentation Index

​Prerequisites

​Install and Setup

​Your First Automation

​Alternative Selector Methods

​Command Chaining

​JSON Output for AI Agents

​Sessions for Parallel Browsers

​Headed Mode for Debugging

​Common Patterns

​Wait for Elements

​Get Information

​Navigate

​Screenshots

​Annotated Screenshots

​Next Steps

Core Concepts

All Commands

Security Features

AI Agent Integration

Build docs developers (and LLMs) love

Prerequisites

Install and Setup

Your First Automation

Alternative Selector Methods

Command Chaining

JSON Output for AI Agents

Sessions for Parallel Browsers

Headed Mode for Debugging

Common Patterns

Wait for Elements

Get Information

Navigate

Screenshots

Annotated Screenshots

Next Steps