Get up and running with Agent Browser quickly. This guide walks you through the core workflow: installing, navigating to a page, taking a snapshot, and interacting with elements using refs.
Prerequisites
Node.js 18+ installed
Basic command line knowledge
Install and Setup
Install Agent Browser
Install globally for best performance: npm install -g agent-browser
Download Chromium
Agent Browser needs Chromium to run: On Linux, you may need system dependencies: agent-browser install --with-deps
Verify Installation
Test that everything works: agent-browser open example.com
agent-browser close
You should see the browser launch and navigate to example.com.
Your First Automation
Let’s automate a simple form fill workflow using the snapshot-ref pattern.
Navigate to a Page
Open a page with a form: agent-browser open https://example.com/login
The browser will launch (headless by default) and navigate to the URL.
Take a Snapshot
Get the accessibility tree with element refs: agent-browser snapshot -i
The -i flag shows only interactive elements (buttons, inputs, links). Output: - textbox "Email" [ref=e1]
- textbox "Password" [ref=e2] [type=password]
- button "Sign In" [ref=e3]
- link "Forgot password?" [ref=e4]
Each element gets a unique @e{N} ref that you can use to interact with it.
Fill the Form
Use refs to fill inputs and click buttons: agent-browser fill @e1 "[email protected] "
agent-browser fill @e2 "password123"
agent-browser click @e3
Refs are deterministic - @e1 always refers to the same element from the snapshot.
Get Results
Wait for navigation and check the result: agent-browser wait --load networkidle
agent-browser get url
agent-browser snapshot -i
This shows the current URL and the new page structure.
Clean Up
Close the browser when done:
Alternative Selector Methods
You can also use traditional selectors alongside refs:
CSS Selectors
Text Selectors
Semantic Locators
agent-browser click "#submit-button"
agent-browser fill "#email" "[email protected] "
agent-browser hover ".dropdown-menu"
Refs are recommended for AI agents because they’re deterministic and don’t require DOM knowledge. CSS selectors are useful when you know the page structure.
Command Chaining
Chain multiple commands for efficiency:
agent-browser open example.com && \
agent-browser snapshot -i && \
agent-browser fill @e1 "value" && \
agent-browser click @e2
The browser daemon persists between commands, so chaining is fast and safe.
JSON Output for AI Agents
Use --json for machine-readable output:
agent-browser snapshot -i --json
Returns structured JSON with the accessibility tree and refs:
{
"success" : true ,
"data" : {
"snapshot" : "- textbox \" Email \" [ref=e1] \n - button \" Submit \" [ref=e2]" ,
"refs" : {
"e1" : {
"role" : "textbox" ,
"name" : "Email" ,
"selector" : "input[type= \" email \" ]"
},
"e2" : {
"role" : "button" ,
"name" : "Submit"
}
}
}
}
Sessions for Parallel Browsers
Run multiple isolated browser instances:
# Terminal 1 - First agent
agent-browser --session agent1 open site-a.com
agent-browser --session agent1 snapshot -i
# Terminal 2 - Second agent
agent-browser --session agent2 open site-b.com
agent-browser --session agent2 snapshot -i
Each session has its own browser, cookies, and state.
Headed Mode for Debugging
See what the browser is doing:
agent-browser --headed open example.com
The browser window will be visible instead of headless.
Common Patterns
Wait for Elements
agent-browser wait "#content" # Wait for element
agent-browser wait 2000 # Wait 2 seconds
agent-browser wait --text "Welcome" # Wait for text
agent-browser wait --load networkidle # Wait for network
agent-browser get text @e1 # Get element text
agent-browser get value @e2 # Get input value
agent-browser get attr @e3 "href" # Get attribute
agent-browser get title # Get page title
agent-browser get url # Get current URL
Navigate
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload page
Screenshots
agent-browser screenshot page.png # Screenshot
agent-browser screenshot --full full.png # Full page
Annotated Screenshots
For multimodal AI models that can see images:
agent-browser screenshot --annotate
This overlays numbered labels on interactive elements in the screenshot. The labels correspond to refs ([1] = @e1, [2] = @e2), so you can use the same refs after viewing the annotated screenshot.
Next Steps
Core Concepts Learn about the Rust CLI + Node.js daemon architecture
All Commands Browse the complete command reference
Security Features Auth vault, domain allowlist, action policies
AI Agent Integration Use with Claude Code, Cursor, and other AI assistants