Documentation Index
Fetch the complete documentation index at: https://mintlify.com/vercel-labs/agent-browser/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Agent Browser supports multiple selector strategies for finding and interacting with elements:
- Refs (
@e1) - Recommended for AI agents
- CSS Selectors (
#id, .class) - Traditional web selectors
- Semantic Locators (
find role button) - Human-readable element queries
- Text Selectors (
text=Submit) - Find by visible text
- XPath (
xpath=//button) - XML path queries
Each has different tradeoffs for stability, readability, and performance.
Refs (Recommended for AI)
Overview
Refs provide deterministic element selection from snapshots:
# Get snapshot
agent-browser snapshot -i
# Output:
# - button "Submit" [ref=e1]
# - textbox "Email" [ref=e2]
# Use refs
agent-browser click @e1
agent-browser fill @e2 "user@example.com"
Why Use Refs?
✓ Deterministic: Points to exact element from snapshot
✓ Fast: No DOM re-query needed
✓ AI-Friendly: Easy for LLMs to generate
✓ Accessible: Based on ARIA tree (screen reader compatible)
Three equivalent formats:
agent-browser click @e1 # Recommended
agent-browser click ref=e1 # Alternative
agent-browser click e1 # Bare format
All three parse to the same reference:
// From snapshot.ts:605-615
export function parseRef(arg: string): string | null {
if (arg.startsWith('@')) return arg.slice(1);
if (arg.startsWith('ref=')) return arg.slice(4);
if (/^e\\d+$/.test(arg)) return arg;
return null;
}
How Refs Resolve
Refs are mapped to Playwright locators:
// From browser.ts:220-240
const refData = this.refMap[ref];
let locator = page.getByRole(refData.role, {
name: refData.name,
exact: true,
});
if (refData.nth !== undefined) {
locator = locator.nth(refData.nth);
}
Example:
// Ref map:
refs = {
"e1": {
role: "button",
name: "Submit",
nth: 0 // First submit button
}
}
// Resolves to:
page.getByRole('button', { name: 'Submit', exact: true }).nth(0)
Ref Scoping
Refs are scoped to a single snapshot. After navigation or page changes, get a fresh snapshot:
agent-browser snapshot -i # Snapshot 1: refs e1, e2, e3
agent-browser click @e1 # Use ref from snapshot 1
agent-browser snapshot -i # Snapshot 2: NEW refs e1, e2, e3
agent-browser fill @e2 "text" # Use ref from snapshot 2
Using stale refs may fail or interact with the wrong element.
See Snapshot Refs for details.
CSS Selectors
Basic CSS
Standard CSS selector syntax:
# ID selector
agent-browser click "#submit-button"
# Class selector
agent-browser click ".btn-primary"
# Attribute selector
agent-browser click "[data-testid='login-button']"
# Combinator
agent-browser click "form > button"
# Pseudo-class
agent-browser click "button:first-child"
When to Use CSS
✓ Stable IDs: When elements have unique, stable IDs
✓ Test IDs: When using data-testid attributes
✓ Simple queries: For one-off scripts
✗ Dynamic classes: Avoid if classes change frequently
✗ AI workflows: Hard for LLMs to generate correctly
CSS selectors are fast (direct DOM query), but may be brittle:
# Fast but brittle
agent-browser click ".submit-btn.primary.large"
# Better - use stable attributes
agent-browser click "[data-testid='submit-button']"
Semantic Locators
Overview
Find elements by their semantic meaning instead of DOM structure:
agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "user@example.com"
agent-browser find text "Sign In" click
Role Locators
Find by ARIA role and optional name:
# By role only
agent-browser find role button click
# By role and name
agent-browser find role button click --name "Submit"
# Exact match
agent-browser find role link click --name "Home" --exact
Supported roles:
| Role | Element Examples |
|---|
button | <button>, <input type="button">, role="button" |
link | <a href="..."> |
textbox | <input type="text">, <textarea> |
checkbox | <input type="checkbox"> |
radio | <input type="radio"> |
combobox | <select>, ARIA comboboxes |
heading | <h1>, <h2>, etc. |
See the ARIA roles spec for the full list.
Text Locators
Find by visible text content:
# Contains text
agent-browser find text "Sign In" click
# Exact text match
agent-browser find text "Submit" click --exact
Text matching is case-sensitive by default. Use --exact for strict matching (no substring matches).
Label Locators
Find inputs by their associated label:
agent-browser find label "Email" fill "user@example.com"
agent-browser find label "Password" fill "secret123"
This works for:
<label for="email">Email</label><input id="email">
<label>Email <input></label>
aria-label="Email" attributes
aria-labelledby references
Placeholder Locators
Find inputs by placeholder text:
agent-browser find placeholder "Enter your email" fill "user@example.com"
Alt Text Locators
Find images by alt text:
agent-browser find alt "Company logo" click
Title Locators
Find elements by title attribute:
agent-browser find title "Click to expand" click
Test ID Locators
Find by data-testid attribute:
agent-browser find testid "login-button" click
Positional Locators
Select specific instances when multiple elements match:
# First match
agent-browser find first "button" click
# Last match
agent-browser find last "button" click
# Nth match (0-indexed)
agent-browser find nth 2 "button" click # Third button
Actions
Semantic locators support these actions:
| Action | Description | Example |
|---|
click | Click element | find role button click |
fill | Fill input | find label "Email" fill "user@example.com" |
type | Type into input | find label "Search" type "query" |
hover | Hover element | find role link hover |
focus | Focus element | find role textbox focus |
check | Check checkbox | find role checkbox check |
uncheck | Uncheck checkbox | find role checkbox uncheck |
text | Get text content | find role heading text |
When to Use Semantic Locators
✓ Human-readable: Easy to understand what they select
✓ Stable: Less affected by DOM changes
✓ Accessible: Based on ARIA/semantic HTML
✗ Verbose: Longer than refs
✗ Slower: Requires DOM query on each use
Text Selectors
Exact Text Match
agent-browser click "text=Submit"
agent-browser click "text='Submit Form'" # Exact match
Substring Match
agent-browser click "text=/.*Submit.*/" # Regex
Case-Insensitive
agent-browser click "text=/submit/i" # Case-insensitive
XPath Selectors
Basic XPath
# Absolute path
agent-browser click "xpath=/html/body/div/button"
# Relative path
agent-browser click "xpath=//button[@id='submit']"
# By text
agent-browser click "xpath=//button[text()='Submit']"
# By attribute
agent-browser click "xpath=//button[@data-testid='login']"
When to Use XPath
✓ Complex queries: When CSS can’t express the logic
✓ Text-based selection: XPath has better text functions
✗ Readability: Hard to read and maintain
✗ Performance: Generally slower than CSS
Selector Precedence
When a selector could match multiple strategies, Agent Browser checks in this order:
- Ref:
@e1, ref=e1, e1 (if matches /^e\d+$/)
- Explicit prefix:
text=, xpath=
- CSS: Anything else
# Interpreted as ref
agent-browser click @e1
# Interpreted as text selector
agent-browser click "text=Submit"
# Interpreted as XPath
agent-browser click "xpath=//button"
# Interpreted as CSS
agent-browser click "#submit"
Selector Composition
Combine selectors for more precise targeting:
CSS + Pseudo-Selectors
# First button in a form
agent-browser click "form button:first-child"
# Last item in a list
agent-browser click "ul li:last-child"
# Nth child
agent-browser click "table tr:nth-child(3)"
Chaining Find Commands
# Click the submit button in the login form
agent-browser find role "button" click --name "Submit"
# Fill the email field in the registration section
agent-browser find label "Email" fill "user@example.com"
Special Selectors
Visible Elements Only
Playwright automatically filters to visible elements:
# Only clicks visible buttons
agent-browser click "button"
To include hidden elements, use --force:
agent-browser click "button" --force
Detached Elements
Playwright waits for elements to be attached to the DOM:
# Waits for button to exist before clicking
agent-browser click "button"
Timeout is 25 seconds by default (configurable via AGENT_BROWSER_DEFAULT_TIMEOUT).
Selector Best Practices
For AI Agents
Use refs:
# Good - deterministic and fast
agent-browser snapshot -i
agent-browser click @e1
# Avoid - brittle and hard for AI to generate
agent-browser click "div.container > form#login > div.actions > button:nth-child(2)"
For Manual Scripting
Use semantic locators or stable CSS:
# Good - semantic and readable
agent-browser find role button click --name "Submit"
# Good - stable test ID
agent-browser click "[data-testid='submit-button']"
# Avoid - brittle classes
agent-browser click ".btn.btn-primary.btn-lg.submit"
For Testing
Use data-testid attributes:
<button data-testid="login-button">Log In</button>
agent-browser click "[data-testid='login-button']"
Test IDs are stable across UI changes.
| Selector Type | Speed | Stability | AI-Friendly |
|---|
| Refs | ⚡⚡⚡ (cached) | ⭐⭐⭐ | ⭐⭐⭐ |
| CSS (ID) | ⚡⚡⚡ | ⭐⭐ | ⭐ |
| CSS (class) | ⚡⚡⚡ | ⭐ | ⭐ |
| CSS (data-testid) | ⚡⚡⚡ | ⭐⭐⭐ | ⭐⭐ |
| Semantic (role) | ⚡⚡ | ⭐⭐⭐ | ⭐⭐ |
| Text | ⚡⚡ | ⭐⭐ | ⭐⭐ |
| XPath | ⚡ | ⭐ | ⭐ |
Debugging Selectors
Highlight Elements
Highlight an element to verify your selector:
agent-browser highlight "#submit-button"
The element will be outlined in red in the browser.
Count Matches
Check how many elements match a selector:
agent-browser get count "button"
# Output: 5
If count > 1, your selector is ambiguous and may click the wrong element.
Snapshot Preview
Use snapshots to see what elements are available:
agent-browser snapshot -i
This shows all interactive elements with their refs and roles.
Advanced Techniques
Shadow DOM
Penetrate shadow DOM boundaries:
# Playwright auto-pierces shadow DOM
agent-browser click "button" # Works even inside shadow roots
Iframes
Switch to iframe before selecting:
# Switch to iframe
agent-browser frame "#payment-iframe"
# Select inside iframe
agent-browser click "#submit"
# Switch back to main frame
agent-browser frame main
Dynamic Content
Wait for elements to appear:
# Wait for element
agent-browser wait "#results"
# Then interact
agent-browser click "#results button"
Or use find which auto-waits:
# Auto-waits for button to appear
agent-browser find role button click --name "Load More"
Next Steps
- Architecture - Understand the Rust CLI + Node.js daemon design
- Snapshot Refs - Deep dive into the ref system
- Sessions - Learn about session isolation and persistence