Skip to main content
Get up and running with Agent Browser quickly. This guide walks you through the core workflow: installing, navigating to a page, taking a snapshot, and interacting with elements using refs.

Prerequisites

  • Node.js 18+ installed
  • Basic command line knowledge

Install and Setup

1

Install Agent Browser

Install globally for best performance:
npm install -g agent-browser
2

Download Chromium

Agent Browser needs Chromium to run:
agent-browser install
On Linux, you may need system dependencies:
agent-browser install --with-deps
3

Verify Installation

Test that everything works:
agent-browser open example.com
agent-browser close
You should see the browser launch and navigate to example.com.

Your First Automation

Let’s automate a simple form fill workflow using the snapshot-ref pattern.
1

Navigate to a Page

Open a page with a form:
agent-browser open https://example.com/login
The browser will launch (headless by default) and navigate to the URL.
2

Take a Snapshot

Get the accessibility tree with element refs:
agent-browser snapshot -i
The -i flag shows only interactive elements (buttons, inputs, links). Output:
- textbox "Email" [ref=e1]
- textbox "Password" [ref=e2] [type=password]
- button "Sign In" [ref=e3]
- link "Forgot password?" [ref=e4]
Each element gets a unique @e{N} ref that you can use to interact with it.
3

Fill the Form

Use refs to fill inputs and click buttons:
agent-browser fill @e1 "[email protected]"
agent-browser fill @e2 "password123"
agent-browser click @e3
Refs are deterministic - @e1 always refers to the same element from the snapshot.
4

Get Results

Wait for navigation and check the result:
agent-browser wait --load networkidle
agent-browser get url
agent-browser snapshot -i
This shows the current URL and the new page structure.
5

Clean Up

Close the browser when done:
agent-browser close

Alternative Selector Methods

You can also use traditional selectors alongside refs:
agent-browser click "#submit-button"
agent-browser fill "#email" "[email protected]"
agent-browser hover ".dropdown-menu"
Refs are recommended for AI agents because they’re deterministic and don’t require DOM knowledge. CSS selectors are useful when you know the page structure.

Command Chaining

Chain multiple commands for efficiency:
agent-browser open example.com && \
  agent-browser snapshot -i && \
  agent-browser fill @e1 "value" && \
  agent-browser click @e2
The browser daemon persists between commands, so chaining is fast and safe.

JSON Output for AI Agents

Use --json for machine-readable output:
agent-browser snapshot -i --json
Returns structured JSON with the accessibility tree and refs:
{
  "success": true,
  "data": {
    "snapshot": "- textbox \"Email\" [ref=e1]\n- button \"Submit\" [ref=e2]",
    "refs": {
      "e1": {
        "role": "textbox",
        "name": "Email",
        "selector": "input[type=\"email\"]"
      },
      "e2": {
        "role": "button",
        "name": "Submit"
      }
    }
  }
}

Sessions for Parallel Browsers

Run multiple isolated browser instances:
# Terminal 1 - First agent
agent-browser --session agent1 open site-a.com
agent-browser --session agent1 snapshot -i

# Terminal 2 - Second agent
agent-browser --session agent2 open site-b.com
agent-browser --session agent2 snapshot -i
Each session has its own browser, cookies, and state.

Headed Mode for Debugging

See what the browser is doing:
agent-browser --headed open example.com
The browser window will be visible instead of headless.

Common Patterns

Wait for Elements

agent-browser wait "#content"           # Wait for element
agent-browser wait 2000                 # Wait 2 seconds
agent-browser wait --text "Welcome"     # Wait for text
agent-browser wait --load networkidle   # Wait for network

Get Information

agent-browser get text @e1              # Get element text
agent-browser get value @e2             # Get input value
agent-browser get attr @e3 "href"       # Get attribute
agent-browser get title                 # Get page title
agent-browser get url                   # Get current URL
agent-browser back                      # Go back
agent-browser forward                   # Go forward
agent-browser reload                    # Reload page

Screenshots

agent-browser screenshot page.png       # Screenshot
agent-browser screenshot --full full.png # Full page

Annotated Screenshots

For multimodal AI models that can see images:
agent-browser screenshot --annotate
This overlays numbered labels on interactive elements in the screenshot. The labels correspond to refs ([1] = @e1, [2] = @e2), so you can use the same refs after viewing the annotated screenshot.

Next Steps

Core Concepts

Learn about the Rust CLI + Node.js daemon architecture

All Commands

Browse the complete command reference

Security Features

Auth vault, domain allowlist, action policies

AI Agent Integration

Use with Claude Code, Cursor, and other AI assistants

Build docs developers (and LLMs) love