Skip to main content

Overview

Snapshot refs provide deterministic element selection for AI agents and automation scripts. Instead of writing brittle CSS selectors or XPath queries, you:
  1. Get an accessibility tree snapshot with numbered refs (@e1, @e2, etc.)
  2. Use those refs to interact with elements
  3. Get a new snapshot when the page changes
This workflow is optimal for AI agents because it separates perception (snapshot) from action (click/fill/etc.).

The Problem with Traditional Selectors

Traditional selectors have issues for automation:
# Brittle - breaks when classes change
agent-browser click ".btn-primary.submit-form"

# Ambiguous - which button if there are multiple?
agent-browser click "button"

# Verbose - hard for AI to generate correctly
agent-browser click "div.container > form#login-form > div.actions > button:nth-child(2)"
Refs solve these problems by giving each element a unique, stable identifier within a snapshot.

How Refs Work

1. Get a Snapshot

agent-browser snapshot
- heading "Example Domain" [ref=e1] [level=1]
- paragraph: This domain is for use in illustrative examples
- link "More information..." [ref=e2]
The snapshot shows:
  • ARIA roles (heading, link, button, textbox, etc.)
  • Accessible names (the text shown to screen readers)
  • Refs (@e1, @e2) for interactive or named elements
  • Attributes (level, checked, etc.)

2. Interact Using Refs

# Click the link
agent-browser click @e2

# Get text from the heading
agent-browser get text @e1
# Output: Example Domain
Refs point to the exact element from the snapshot, so there’s no ambiguity.

3. Get a New Snapshot After Changes

When the page changes (navigation, dynamic content), get a fresh snapshot:
agent-browser click @e2        # Navigate to new page
agent-browser snapshot         # Get new refs for new page
Refs are scoped to a single snapshot. After navigation or DOM changes, you need a new snapshot with new refs.

Accessibility Tree Source

Snapshots are built from the browser’s accessibility tree - the same structure used by screen readers:
// From snapshot.ts:274
const ariaTree = await locator.ariaSnapshot();
Playwright’s ariaSnapshot() returns a text representation like:
- heading "Products" [level=1]
- list:
  - listitem:
    - link "Headphones"
  - listitem:
    - link "Speakers"
- button "Add to Cart"
This is then enhanced with refs and filtered based on options.

Ref Assignment Rules

Interactive Elements (Always Get Refs)

Elements with interactive ARIA roles automatically get refs:
// From snapshot.ts:70-88
const INTERACTIVE_ROLES = new Set([
  'button', 'link', 'textbox', 'checkbox', 'radio',
  'combobox', 'listbox', 'menuitem', 'searchbox',
  'slider', 'spinbutton', 'switch', 'tab', 'treeitem'
]);
Example:
- button "Submit" [ref=e1]
- textbox "Email" [ref=e2]
- checkbox "Remember me" [ref=e3]
- link "Forgot password?" [ref=e4]

Content Elements (Get Refs If Named)

Elements that provide context get refs only if they have a name:
// From snapshot.ts:93-104
const CONTENT_ROLES = new Set([
  'heading', 'cell', 'gridcell', 'columnheader', 'rowheader',
  'listitem', 'article', 'region', 'main', 'navigation'
]);
Example:
- heading "Welcome" [ref=e1] [level=1]     # Named → gets ref
- article "Blog Post Title" [ref=e2]       # Named → gets ref
- list:                                     # Unnamed → no ref
  - listitem: First item                   # Unnamed → no ref

Cursor-Interactive Elements (With -C Flag)

The --cursor flag finds elements that don’t have proper ARIA roles but are visually interactive:
agent-browser snapshot -C
This finds elements with:
  • cursor: pointer CSS property
  • onclick event handlers
  • tabindex attribute (except -1)
// From snapshot.ts:225-232
const hasCursorPointer = computedStyle.cursor === 'pointer';
const hasOnClick = el.hasAttribute('onclick') || el.onclick !== null;
const tabIndex = el.getAttribute('tabindex');
const hasTabIndex = tabIndex !== null && tabIndex !== '-1';
These get pseudo-roles:
- clickable "Menu" [ref=e5] [cursor:pointer, onclick]
- focusable "Search" [ref=e6] [tabindex]
This is useful for modern web apps that use <div> and <span> as buttons instead of semantic HTML.

Ref Storage Format

Refs are stored in a map that tracks how to locate each element:
// From snapshot.ts:22-30
export interface RefMap {
  [ref: string]: {
    selector: string;  // How to locate the element
    role: string;      // ARIA role
    name: string;      // Accessible name
    nth?: number;      // Disambiguation index
  };
}
Example:
refs = {
  "e1": {
    selector: "getByRole('button', { name: \"Submit\", exact: true })",
    role: "button",
    name: "Submit",
    nth: 0  // First "Submit" button
  },
  "e2": {
    selector: "getByRole('button', { name: \"Submit\", exact: true })",
    role: "button",
    name: "Submit",
    nth: 1  // Second "Submit" button
  }
}

Duplicate Handling

When multiple elements have the same role and name, refs include an nth index:
- button "Delete" [ref=e1] [nth=0]
- button "Delete" [ref=e2] [nth=1]
- button "Delete" [ref=e3] [nth=2]
The nth field tells Playwright which instance to select:
// From browser.ts:229-237
let locator: Locator = page.getByRole(refData.role, {
  name: refData.name,
  exact: true,
});

if (refData.nth !== undefined) {
  locator = locator.nth(refData.nth);
}

Snapshot Filtering Options

Interactive Only (-i)

Show only interactive elements (buttons, links, inputs):
agent-browser snapshot -i
- button "Submit" [ref=e1]
- textbox "Email" [ref=e2]
- textbox "Password" [ref=e3]
- link "Forgot password?" [ref=e4]
This is the recommended mode for AI agents - it reduces noise by hiding structural elements.

Cursor-Interactive (-C)

Include elements with cursor:pointer or click handlers:
agent-browser snapshot -i -C
- button "Submit" [ref=e1]
- textbox "Email" [ref=e2]
# Cursor-interactive elements:
- clickable "Menu" [ref=e3] [cursor:pointer, onclick]
- clickable "Close" [ref=e4] [cursor:pointer]
Use this when targeting apps with custom clickable <div> elements.

Compact (-c)

Remove empty structural elements:
agent-browser snapshot -c
Structural roles (generic, group, list) without content are hidden:
# Before compact:
- group:
  - list:
    - listitem:
      - button "Item" [ref=e1]

# After compact:
- button "Item" [ref=e1]

Depth Limit (-d N)

Limit tree depth to N levels:
agent-browser snapshot -d 3
Useful for large pages where you only need top-level structure.

Scoped Snapshots (-s SELECTOR)

Limit snapshot to a CSS selector:
agent-browser snapshot -s "#main"
Only elements inside #main appear in the snapshot.

Using Refs in Commands

Refs work anywhere a selector is expected:
# Click
agent-browser click @e1

# Fill
agent-browser fill @e2 "[email protected]"

# Get text
agent-browser get text @e1

# Hover
agent-browser hover @e3

# Check state
agent-browser is visible @e4
Refs are parsed in three formats:
  • @e1 - Recommended format
  • ref=e1 - Alternative format
  • e1 - Bare format (if it matches /^e\d+$/)
// From snapshot.ts:605-615
export function parseRef(arg: string): string | null {
  if (arg.startsWith('@')) return arg.slice(1);
  if (arg.startsWith('ref=')) return arg.slice(4);
  if (/^e\d+$/.test(arg)) return arg;
  return null;
}

Ref Lifecycle

Creation

Refs are generated sequentially (e1, e2, e3, …) during snapshot generation:
// From snapshot.ts:50-64
let refCounter = 0;

function resetRefs(): void {
  refCounter = 0;
}

function nextRef(): string {
  return `e${++refCounter}`;
}

Validity

Refs are valid until:
  • The page navigates to a new URL
  • The DOM changes significantly
  • You explicitly get a new snapshot
Using a stale ref won’t crash - Playwright will retry the locator - but it may fail or select the wrong element if the page changed.

Best Practice

Get a fresh snapshot after:
  • Navigation (agent-browser open ...)
  • Clicking links/buttons that change the page
  • Waiting for dynamic content to load
# Good workflow:
agent-browser open example.com
agent-browser snapshot -i          # Get refs for page 1
agent-browser click @e1            # Click a link
agent-browser snapshot -i          # Get new refs for page 2
agent-browser fill @e2 "input"    # Use new refs

Annotated Screenshots

Visual representation of refs with numbered labels:
agent-browser screenshot --annotate
Screenshot saved to /tmp/screenshot-2026-03-02T10-30-00-abc123.png
[1] @e1 button "Submit"
[2] @e2 textbox "Email"
[3] @e3 link "Forgot password?"
The screenshot has numbered labels [1], [2], [3] overlaid on each element. The label numbers match the ref numbers (@e1[1]). This is useful for:
  • Multimodal AI models that reason about visual layout
  • Debugging selector issues
  • Documenting UI flows

JSON Mode

Get snapshots in machine-readable format:
agent-browser snapshot -i --json
{
  "success": true,
  "data": {
    "snapshot": "- button \"Submit\" [ref=e1]\n- textbox \"Email\" [ref=e2]",
    "refs": {
      "e1": {
        "selector": "getByRole('button', { name: \"Submit\", exact: true })",
        "role": "button",
        "name": "Submit"
      },
      "e2": {
        "selector": "getByRole('textbox', { name: \"Email\", exact: true })",
        "role": "textbox",
        "name": "Email"
      }
    }
  }
}
AI agents can parse this JSON to extract the snapshot tree and ref mappings.

Performance Characteristics

Snapshot generation is fast:
  • Small page (10-20 elements): ~50-100ms
  • Medium page (100-200 elements): ~200-400ms
  • Large page (500+ elements): ~500-1000ms
Interactive-only mode (-i) is faster because it filters earlier in the pipeline:
// From snapshot.ts:394-428
if (options.interactive) {
  // Only process interactive elements, skip the rest
  for (const line of lines) {
    if (INTERACTIVE_ROLES.has(role)) {
      result.push(line);
    }
  }
}

Comparison to Traditional Selectors

AspectCSS/XPathRefs
StabilityBreaks when DOM changesStable within snapshot
AI-FriendlinessHard to generate correctlyEasy - just use @e1
DeterminismAmbiguous if multiple matchesAlways unique
AccessibilityIgnores semantic meaningBased on ARIA/accessibility tree
SpeedFast (direct DOM query)Fast (cached locator)

Next Steps

  • Architecture - Understand the Rust CLI + Node.js daemon design
  • Sessions - Learn about session isolation and persistence
  • Selectors - Master all selector types (CSS, refs, semantic)

Build docs developers (and LLMs) love