Documentation Index
Fetch the complete documentation index at: https://mintlify.com/vercel-labs/agent-browser/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Snapshot refs provide deterministic element selection for AI agents and automation scripts. Instead of writing brittle CSS selectors or XPath queries, you:
- Get an accessibility tree snapshot with numbered refs (
@e1, @e2, etc.)
- Use those refs to interact with elements
- Get a new snapshot when the page changes
This workflow is optimal for AI agents because it separates perception (snapshot) from action (click/fill/etc.).
The Problem with Traditional Selectors
Traditional selectors have issues for automation:
# Brittle - breaks when classes change
agent-browser click ".btn-primary.submit-form"
# Ambiguous - which button if there are multiple?
agent-browser click "button"
# Verbose - hard for AI to generate correctly
agent-browser click "div.container > form#login-form > div.actions > button:nth-child(2)"
Refs solve these problems by giving each element a unique, stable identifier within a snapshot.
How Refs Work
1. Get a Snapshot
- heading "Example Domain" [ref=e1] [level=1]
- paragraph: This domain is for use in illustrative examples
- link "More information..." [ref=e2]
The snapshot shows:
- ARIA roles (heading, link, button, textbox, etc.)
- Accessible names (the text shown to screen readers)
- Refs (
@e1, @e2) for interactive or named elements
- Attributes (level, checked, etc.)
2. Interact Using Refs
# Click the link
agent-browser click @e2
# Get text from the heading
agent-browser get text @e1
# Output: Example Domain
Refs point to the exact element from the snapshot, so there’s no ambiguity.
3. Get a New Snapshot After Changes
When the page changes (navigation, dynamic content), get a fresh snapshot:
agent-browser click @e2 # Navigate to new page
agent-browser snapshot # Get new refs for new page
Refs are scoped to a single snapshot. After navigation or DOM changes, you need a new snapshot with new refs.
Accessibility Tree Source
Snapshots are built from the browser’s accessibility tree - the same structure used by screen readers:
// From snapshot.ts:274
const ariaTree = await locator.ariaSnapshot();
Playwright’s ariaSnapshot() returns a text representation like:
- heading "Products" [level=1]
- list:
- listitem:
- link "Headphones"
- listitem:
- link "Speakers"
- button "Add to Cart"
This is then enhanced with refs and filtered based on options.
Ref Assignment Rules
Interactive Elements (Always Get Refs)
Elements with interactive ARIA roles automatically get refs:
// From snapshot.ts:70-88
const INTERACTIVE_ROLES = new Set([
'button', 'link', 'textbox', 'checkbox', 'radio',
'combobox', 'listbox', 'menuitem', 'searchbox',
'slider', 'spinbutton', 'switch', 'tab', 'treeitem'
]);
Example:
- button "Submit" [ref=e1]
- textbox "Email" [ref=e2]
- checkbox "Remember me" [ref=e3]
- link "Forgot password?" [ref=e4]
Content Elements (Get Refs If Named)
Elements that provide context get refs only if they have a name:
// From snapshot.ts:93-104
const CONTENT_ROLES = new Set([
'heading', 'cell', 'gridcell', 'columnheader', 'rowheader',
'listitem', 'article', 'region', 'main', 'navigation'
]);
Example:
- heading "Welcome" [ref=e1] [level=1] # Named → gets ref
- article "Blog Post Title" [ref=e2] # Named → gets ref
- list: # Unnamed → no ref
- listitem: First item # Unnamed → no ref
Cursor-Interactive Elements (With -C Flag)
The --cursor flag finds elements that don’t have proper ARIA roles but are visually interactive:
agent-browser snapshot -C
This finds elements with:
cursor: pointer CSS property
onclick event handlers
tabindex attribute (except -1)
// From snapshot.ts:225-232
const hasCursorPointer = computedStyle.cursor === 'pointer';
const hasOnClick = el.hasAttribute('onclick') || el.onclick !== null;
const tabIndex = el.getAttribute('tabindex');
const hasTabIndex = tabIndex !== null && tabIndex !== '-1';
These get pseudo-roles:
- clickable "Menu" [ref=e5] [cursor:pointer, onclick]
- focusable "Search" [ref=e6] [tabindex]
This is useful for modern web apps that use <div> and <span> as buttons instead of semantic HTML.
Refs are stored in a map that tracks how to locate each element:
// From snapshot.ts:22-30
export interface RefMap {
[ref: string]: {
selector: string; // How to locate the element
role: string; // ARIA role
name: string; // Accessible name
nth?: number; // Disambiguation index
};
}
Example:
refs = {
"e1": {
selector: "getByRole('button', { name: \"Submit\", exact: true })",
role: "button",
name: "Submit",
nth: 0 // First "Submit" button
},
"e2": {
selector: "getByRole('button', { name: \"Submit\", exact: true })",
role: "button",
name: "Submit",
nth: 1 // Second "Submit" button
}
}
Duplicate Handling
When multiple elements have the same role and name, refs include an nth index:
- button "Delete" [ref=e1] [nth=0]
- button "Delete" [ref=e2] [nth=1]
- button "Delete" [ref=e3] [nth=2]
The nth field tells Playwright which instance to select:
// From browser.ts:229-237
let locator: Locator = page.getByRole(refData.role, {
name: refData.name,
exact: true,
});
if (refData.nth !== undefined) {
locator = locator.nth(refData.nth);
}
Snapshot Filtering Options
Interactive Only (-i)
Show only interactive elements (buttons, links, inputs):
agent-browser snapshot -i
- button "Submit" [ref=e1]
- textbox "Email" [ref=e2]
- textbox "Password" [ref=e3]
- link "Forgot password?" [ref=e4]
This is the recommended mode for AI agents - it reduces noise by hiding structural elements.
Cursor-Interactive (-C)
Include elements with cursor:pointer or click handlers:
agent-browser snapshot -i -C
- button "Submit" [ref=e1]
- textbox "Email" [ref=e2]
# Cursor-interactive elements:
- clickable "Menu" [ref=e3] [cursor:pointer, onclick]
- clickable "Close" [ref=e4] [cursor:pointer]
Use this when targeting apps with custom clickable <div> elements.
Compact (-c)
Remove empty structural elements:
agent-browser snapshot -c
Structural roles (generic, group, list) without content are hidden:
# Before compact:
- group:
- list:
- listitem:
- button "Item" [ref=e1]
# After compact:
- button "Item" [ref=e1]
Depth Limit (-d N)
Limit tree depth to N levels:
agent-browser snapshot -d 3
Useful for large pages where you only need top-level structure.
Scoped Snapshots (-s SELECTOR)
Limit snapshot to a CSS selector:
agent-browser snapshot -s "#main"
Only elements inside #main appear in the snapshot.
Using Refs in Commands
Refs work anywhere a selector is expected:
# Click
agent-browser click @e1
# Fill
agent-browser fill @e2 "user@example.com"
# Get text
agent-browser get text @e1
# Hover
agent-browser hover @e3
# Check state
agent-browser is visible @e4
Refs are parsed in three formats:
@e1 - Recommended format
ref=e1 - Alternative format
e1 - Bare format (if it matches /^e\d+$/)
// From snapshot.ts:605-615
export function parseRef(arg: string): string | null {
if (arg.startsWith('@')) return arg.slice(1);
if (arg.startsWith('ref=')) return arg.slice(4);
if (/^e\d+$/.test(arg)) return arg;
return null;
}
Ref Lifecycle
Creation
Refs are generated sequentially (e1, e2, e3, …) during snapshot generation:
// From snapshot.ts:50-64
let refCounter = 0;
function resetRefs(): void {
refCounter = 0;
}
function nextRef(): string {
return `e${++refCounter}`;
}
Validity
Refs are valid until:
- The page navigates to a new URL
- The DOM changes significantly
- You explicitly get a new snapshot
Using a stale ref won’t crash - Playwright will retry the locator - but it may fail or select the wrong element if the page changed.
Best Practice
Get a fresh snapshot after:
- Navigation (
agent-browser open ...)
- Clicking links/buttons that change the page
- Waiting for dynamic content to load
# Good workflow:
agent-browser open example.com
agent-browser snapshot -i # Get refs for page 1
agent-browser click @e1 # Click a link
agent-browser snapshot -i # Get new refs for page 2
agent-browser fill @e2 "input" # Use new refs
Annotated Screenshots
Visual representation of refs with numbered labels:
agent-browser screenshot --annotate
Screenshot saved to /tmp/screenshot-2026-03-02T10-30-00-abc123.png
[1] @e1 button "Submit"
[2] @e2 textbox "Email"
[3] @e3 link "Forgot password?"
The screenshot has numbered labels [1], [2], [3] overlaid on each element. The label numbers match the ref numbers (@e1 → [1]).
This is useful for:
- Multimodal AI models that reason about visual layout
- Debugging selector issues
- Documenting UI flows
JSON Mode
Get snapshots in machine-readable format:
agent-browser snapshot -i --json
{
"success": true,
"data": {
"snapshot": "- button \"Submit\" [ref=e1]\n- textbox \"Email\" [ref=e2]",
"refs": {
"e1": {
"selector": "getByRole('button', { name: \"Submit\", exact: true })",
"role": "button",
"name": "Submit"
},
"e2": {
"selector": "getByRole('textbox', { name: \"Email\", exact: true })",
"role": "textbox",
"name": "Email"
}
}
}
}
AI agents can parse this JSON to extract the snapshot tree and ref mappings.
Snapshot generation is fast:
- Small page (10-20 elements): ~50-100ms
- Medium page (100-200 elements): ~200-400ms
- Large page (500+ elements): ~500-1000ms
Interactive-only mode (-i) is faster because it filters earlier in the pipeline:
// From snapshot.ts:394-428
if (options.interactive) {
// Only process interactive elements, skip the rest
for (const line of lines) {
if (INTERACTIVE_ROLES.has(role)) {
result.push(line);
}
}
}
Comparison to Traditional Selectors
| Aspect | CSS/XPath | Refs |
|---|
| Stability | Breaks when DOM changes | Stable within snapshot |
| AI-Friendliness | Hard to generate correctly | Easy - just use @e1 |
| Determinism | Ambiguous if multiple matches | Always unique |
| Accessibility | Ignores semantic meaning | Based on ARIA/accessibility tree |
| Speed | Fast (direct DOM query) | Fast (cached locator) |
Next Steps
- Architecture - Understand the Rust CLI + Node.js daemon design
- Sessions - Learn about session isolation and persistence
- Selectors - Master all selector types (CSS, refs, semantic)