Skip to main content

Overview

Agent Browser supports multiple selector strategies for finding and interacting with elements:
  1. Refs (@e1) - Recommended for AI agents
  2. CSS Selectors (#id, .class) - Traditional web selectors
  3. Semantic Locators (find role button) - Human-readable element queries
  4. Text Selectors (text=Submit) - Find by visible text
  5. XPath (xpath=//button) - XML path queries
Each has different tradeoffs for stability, readability, and performance.

Overview

Refs provide deterministic element selection from snapshots:
# Get snapshot
agent-browser snapshot -i
# Output:
# - button "Submit" [ref=e1]
# - textbox "Email" [ref=e2]

# Use refs
agent-browser click @e1
agent-browser fill @e2 "user@example.com"

Why Use Refs?

Deterministic: Points to exact element from snapshot
Fast: No DOM re-query needed
AI-Friendly: Easy for LLMs to generate
Accessible: Based on ARIA tree (screen reader compatible)

Ref Formats

Three equivalent formats:
agent-browser click @e1      # Recommended
agent-browser click ref=e1   # Alternative
agent-browser click e1       # Bare format
All three parse to the same reference:
// From snapshot.ts:605-615
export function parseRef(arg: string): string | null {
  if (arg.startsWith('@')) return arg.slice(1);
  if (arg.startsWith('ref=')) return arg.slice(4);
  if (/^e\\d+$/.test(arg)) return arg;
  return null;
}

How Refs Resolve

Refs are mapped to Playwright locators:
// From browser.ts:220-240
const refData = this.refMap[ref];
let locator = page.getByRole(refData.role, {
  name: refData.name,
  exact: true,
});

if (refData.nth !== undefined) {
  locator = locator.nth(refData.nth);
}
Example:
// Ref map:
refs = {
  "e1": {
    role: "button",
    name: "Submit",
    nth: 0  // First submit button
  }
}

// Resolves to:
page.getByRole('button', { name: 'Submit', exact: true }).nth(0)

Ref Scoping

Refs are scoped to a single snapshot. After navigation or page changes, get a fresh snapshot:
agent-browser snapshot -i      # Snapshot 1: refs e1, e2, e3
agent-browser click @e1        # Use ref from snapshot 1
agent-browser snapshot -i      # Snapshot 2: NEW refs e1, e2, e3
agent-browser fill @e2 "text" # Use ref from snapshot 2
Using stale refs may fail or interact with the wrong element. See Snapshot Refs for details.

CSS Selectors

Basic CSS

Standard CSS selector syntax:
# ID selector
agent-browser click "#submit-button"

# Class selector
agent-browser click ".btn-primary"

# Attribute selector
agent-browser click "[data-testid='login-button']"

# Combinator
agent-browser click "form > button"

# Pseudo-class
agent-browser click "button:first-child"

When to Use CSS

Stable IDs: When elements have unique, stable IDs
Test IDs: When using data-testid attributes
Simple queries: For one-off scripts
Dynamic classes: Avoid if classes change frequently
AI workflows: Hard for LLMs to generate correctly

Performance

CSS selectors are fast (direct DOM query), but may be brittle:
# Fast but brittle
agent-browser click ".submit-btn.primary.large"

# Better - use stable attributes
agent-browser click "[data-testid='submit-button']"

Semantic Locators

Overview

Find elements by their semantic meaning instead of DOM structure:
agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "user@example.com"
agent-browser find text "Sign In" click

Role Locators

Find by ARIA role and optional name:
# By role only
agent-browser find role button click

# By role and name
agent-browser find role button click --name "Submit"

# Exact match
agent-browser find role link click --name "Home" --exact
Supported roles:
RoleElement Examples
button<button>, <input type="button">, role="button"
link<a href="...">
textbox<input type="text">, <textarea>
checkbox<input type="checkbox">
radio<input type="radio">
combobox<select>, ARIA comboboxes
heading<h1>, <h2>, etc.
See the ARIA roles spec for the full list.

Text Locators

Find by visible text content:
# Contains text
agent-browser find text "Sign In" click

# Exact text match
agent-browser find text "Submit" click --exact
Text matching is case-sensitive by default. Use --exact for strict matching (no substring matches).

Label Locators

Find inputs by their associated label:
agent-browser find label "Email" fill "user@example.com"
agent-browser find label "Password" fill "secret123"
This works for:
  • <label for="email">Email</label><input id="email">
  • <label>Email <input></label>
  • aria-label="Email" attributes
  • aria-labelledby references

Placeholder Locators

Find inputs by placeholder text:
agent-browser find placeholder "Enter your email" fill "user@example.com"

Alt Text Locators

Find images by alt text:
agent-browser find alt "Company logo" click

Title Locators

Find elements by title attribute:
agent-browser find title "Click to expand" click

Test ID Locators

Find by data-testid attribute:
agent-browser find testid "login-button" click

Positional Locators

Select specific instances when multiple elements match:
# First match
agent-browser find first "button" click

# Last match
agent-browser find last "button" click

# Nth match (0-indexed)
agent-browser find nth 2 "button" click  # Third button

Actions

Semantic locators support these actions:
ActionDescriptionExample
clickClick elementfind role button click
fillFill inputfind label "Email" fill "user@example.com"
typeType into inputfind label "Search" type "query"
hoverHover elementfind role link hover
focusFocus elementfind role textbox focus
checkCheck checkboxfind role checkbox check
uncheckUncheck checkboxfind role checkbox uncheck
textGet text contentfind role heading text

When to Use Semantic Locators

Human-readable: Easy to understand what they select
Stable: Less affected by DOM changes
Accessible: Based on ARIA/semantic HTML
Verbose: Longer than refs
Slower: Requires DOM query on each use

Text Selectors

Exact Text Match

agent-browser click "text=Submit"
agent-browser click "text='Submit Form'"  # Exact match

Substring Match

agent-browser click "text=/.*Submit.*/"  # Regex

Case-Insensitive

agent-browser click "text=/submit/i"  # Case-insensitive

XPath Selectors

Basic XPath

# Absolute path
agent-browser click "xpath=/html/body/div/button"

# Relative path
agent-browser click "xpath=//button[@id='submit']"

# By text
agent-browser click "xpath=//button[text()='Submit']"

# By attribute
agent-browser click "xpath=//button[@data-testid='login']"

When to Use XPath

Complex queries: When CSS can’t express the logic
Text-based selection: XPath has better text functions
Readability: Hard to read and maintain
Performance: Generally slower than CSS

Selector Precedence

When a selector could match multiple strategies, Agent Browser checks in this order:
  1. Ref: @e1, ref=e1, e1 (if matches /^e\d+$/)
  2. Explicit prefix: text=, xpath=
  3. CSS: Anything else
# Interpreted as ref
agent-browser click @e1

# Interpreted as text selector
agent-browser click "text=Submit"

# Interpreted as XPath
agent-browser click "xpath=//button"

# Interpreted as CSS
agent-browser click "#submit"

Selector Composition

Combine selectors for more precise targeting:

CSS + Pseudo-Selectors

# First button in a form
agent-browser click "form button:first-child"

# Last item in a list
agent-browser click "ul li:last-child"

# Nth child
agent-browser click "table tr:nth-child(3)"

Chaining Find Commands

# Click the submit button in the login form
agent-browser find role "button" click --name "Submit"

# Fill the email field in the registration section
agent-browser find label "Email" fill "user@example.com"

Special Selectors

Visible Elements Only

Playwright automatically filters to visible elements:
# Only clicks visible buttons
agent-browser click "button"
To include hidden elements, use --force:
agent-browser click "button" --force

Detached Elements

Playwright waits for elements to be attached to the DOM:
# Waits for button to exist before clicking
agent-browser click "button"
Timeout is 25 seconds by default (configurable via AGENT_BROWSER_DEFAULT_TIMEOUT).

Selector Best Practices

For AI Agents

Use refs:
# Good - deterministic and fast
agent-browser snapshot -i
agent-browser click @e1

# Avoid - brittle and hard for AI to generate
agent-browser click "div.container > form#login > div.actions > button:nth-child(2)"

For Manual Scripting

Use semantic locators or stable CSS:
# Good - semantic and readable
agent-browser find role button click --name "Submit"

# Good - stable test ID
agent-browser click "[data-testid='submit-button']"

# Avoid - brittle classes
agent-browser click ".btn.btn-primary.btn-lg.submit"

For Testing

Use data-testid attributes:
<button data-testid="login-button">Log In</button>
agent-browser click "[data-testid='login-button']"
Test IDs are stable across UI changes.

Performance Comparison

Selector TypeSpeedStabilityAI-Friendly
Refs⚡⚡⚡ (cached)⭐⭐⭐⭐⭐⭐
CSS (ID)⚡⚡⚡⭐⭐
CSS (class)⚡⚡⚡
CSS (data-testid)⚡⚡⚡⭐⭐⭐⭐⭐
Semantic (role)⚡⚡⭐⭐⭐⭐⭐
Text⚡⚡⭐⭐⭐⭐
XPath

Debugging Selectors

Highlight Elements

Highlight an element to verify your selector:
agent-browser highlight "#submit-button"
The element will be outlined in red in the browser.

Count Matches

Check how many elements match a selector:
agent-browser get count "button"
# Output: 5
If count > 1, your selector is ambiguous and may click the wrong element.

Snapshot Preview

Use snapshots to see what elements are available:
agent-browser snapshot -i
This shows all interactive elements with their refs and roles.

Advanced Techniques

Shadow DOM

Penetrate shadow DOM boundaries:
# Playwright auto-pierces shadow DOM
agent-browser click "button"  # Works even inside shadow roots

Iframes

Switch to iframe before selecting:
# Switch to iframe
agent-browser frame "#payment-iframe"

# Select inside iframe
agent-browser click "#submit"

# Switch back to main frame
agent-browser frame main

Dynamic Content

Wait for elements to appear:
# Wait for element
agent-browser wait "#results"

# Then interact
agent-browser click "#results button"
Or use find which auto-waits:
# Auto-waits for button to appear
agent-browser find role button click --name "Load More"

Next Steps

  • Architecture - Understand the Rust CLI + Node.js daemon design
  • Snapshot Refs - Deep dive into the ref system
  • Sessions - Learn about session isolation and persistence

Build docs developers (and LLMs) love