Snapshot Refs

Overview

Snapshot refs provide deterministic element selection for AI agents and automation scripts. Instead of writing brittle CSS selectors or XPath queries, you:

Get an accessibility tree snapshot with numbered refs (@e1, @e2, etc.)
Use those refs to interact with elements
Get a new snapshot when the page changes

This workflow is optimal for AI agents because it separates perception (snapshot) from action (click/fill/etc.).

The Problem with Traditional Selectors

Traditional selectors have issues for automation:

# Brittle - breaks when classes change
agent-browser click ".btn-primary.submit-form"

# Ambiguous - which button if there are multiple?
agent-browser click "button"

# Verbose - hard for AI to generate correctly
agent-browser click "div.container > form#login-form > div.actions > button:nth-child(2)"

Refs solve these problems by giving each element a unique, stable identifier within a snapshot.

How Refs Work

1. Get a Snapshot

agent-browser snapshot

- heading "Example Domain" [ref=e1] [level=1]
- paragraph: This domain is for use in illustrative examples
- link "More information..." [ref=e2]

The snapshot shows:

ARIA roles (heading, link, button, textbox, etc.)
Accessible names (the text shown to screen readers)
Refs (@e1, @e2) for interactive or named elements
Attributes (level, checked, etc.)

2. Interact Using Refs

# Click the link
agent-browser click @e2

# Get text from the heading
agent-browser get text @e1
# Output: Example Domain

Refs point to the exact element from the snapshot, so there’s no ambiguity.

3. Get a New Snapshot After Changes

When the page changes (navigation, dynamic content), get a fresh snapshot:

agent-browser click @e2        # Navigate to new page
agent-browser snapshot         # Get new refs for new page

Refs are scoped to a single snapshot. After navigation or DOM changes, you need a new snapshot with new refs.

Accessibility Tree Source

Snapshots are built from the browser’s accessibility tree - the same structure used by screen readers:

// From snapshot.ts:274
const ariaTree = await locator.ariaSnapshot();

Playwright’s ariaSnapshot() returns a text representation like:

- heading "Products" [level=1]
- list:
  - listitem:
    - link "Headphones"
  - listitem:
    - link "Speakers"
- button "Add to Cart"

This is then enhanced with refs and filtered based on options.

Ref Assignment Rules

Interactive Elements (Always Get Refs)

Elements with interactive ARIA roles automatically get refs:

// From snapshot.ts:70-88
const INTERACTIVE_ROLES = new Set([
  'button', 'link', 'textbox', 'checkbox', 'radio',
  'combobox', 'listbox', 'menuitem', 'searchbox',
  'slider', 'spinbutton', 'switch', 'tab', 'treeitem'
]);

Example:

- button "Submit" [ref=e1]
- textbox "Email" [ref=e2]
- checkbox "Remember me" [ref=e3]
- link "Forgot password?" [ref=e4]

Content Elements (Get Refs If Named)

Elements that provide context get refs only if they have a name:

// From snapshot.ts:93-104
const CONTENT_ROLES = new Set([
  'heading', 'cell', 'gridcell', 'columnheader', 'rowheader',
  'listitem', 'article', 'region', 'main', 'navigation'
]);

Example:

- heading "Welcome" [ref=e1] [level=1]     # Named → gets ref
- article "Blog Post Title" [ref=e2]       # Named → gets ref
- list:                                     # Unnamed → no ref
  - listitem: First item                   # Unnamed → no ref

Cursor-Interactive Elements (With `-C` Flag)

The --cursor flag finds elements that don’t have proper ARIA roles but are visually interactive:

agent-browser snapshot -C

This finds elements with:

cursor: pointer CSS property
onclick event handlers
tabindex attribute (except -1)

// From snapshot.ts:225-232
const hasCursorPointer = computedStyle.cursor === 'pointer';
const hasOnClick = el.hasAttribute('onclick') || el.onclick !== null;
const tabIndex = el.getAttribute('tabindex');
const hasTabIndex = tabIndex !== null && tabIndex !== '-1';

These get pseudo-roles:

- clickable "Menu" [ref=e5] [cursor:pointer, onclick]
- focusable "Search" [ref=e6] [tabindex]

This is useful for modern web apps that use <div> and <span> as buttons instead of semantic HTML.

Ref Storage Format

Refs are stored in a map that tracks how to locate each element:

// From snapshot.ts:22-30
export interface RefMap {
  [ref: string]: {
    selector: string;  // How to locate the element
    role: string;      // ARIA role
    name: string;      // Accessible name
    nth?: number;      // Disambiguation index
  };
}

Example:

refs = {
  "e1": {
    selector: "getByRole('button', { name: \"Submit\", exact: true })",
    role: "button",
    name: "Submit",
    nth: 0  // First "Submit" button
  },
  "e2": {
    selector: "getByRole('button', { name: \"Submit\", exact: true })",
    role: "button",
    name: "Submit",
    nth: 1  // Second "Submit" button
  }
}

Duplicate Handling

When multiple elements have the same role and name, refs include an nth index:

- button "Delete" [ref=e1] [nth=0]
- button "Delete" [ref=e2] [nth=1]
- button "Delete" [ref=e3] [nth=2]

The nth field tells Playwright which instance to select:

// From browser.ts:229-237
let locator: Locator = page.getByRole(refData.role, {
  name: refData.name,
  exact: true,
});

if (refData.nth !== undefined) {
  locator = locator.nth(refData.nth);
}

Snapshot Filtering Options

Interactive Only (`-i`)

Show only interactive elements (buttons, links, inputs):

agent-browser snapshot -i

- button "Submit" [ref=e1]
- textbox "Email" [ref=e2]
- textbox "Password" [ref=e3]
- link "Forgot password?" [ref=e4]

This is the recommended mode for AI agents - it reduces noise by hiding structural elements.

Cursor-Interactive (`-C`)

Include elements with cursor:pointer or click handlers:

agent-browser snapshot -i -C

- button "Submit" [ref=e1]
- textbox "Email" [ref=e2]
# Cursor-interactive elements:
- clickable "Menu" [ref=e3] [cursor:pointer, onclick]
- clickable "Close" [ref=e4] [cursor:pointer]

Use this when targeting apps with custom clickable <div> elements.

Compact (`-c`)

Remove empty structural elements:

agent-browser snapshot -c

Structural roles (generic, group, list) without content are hidden:

# Before compact:
- group:
  - list:
    - listitem:
      - button "Item" [ref=e1]

# After compact:
- button "Item" [ref=e1]

Depth Limit (`-d N`)

Limit tree depth to N levels:

agent-browser snapshot -d 3

Useful for large pages where you only need top-level structure.

Scoped Snapshots (`-s SELECTOR`)

Limit snapshot to a CSS selector:

agent-browser snapshot -s "#main"

Only elements inside #main appear in the snapshot.

Using Refs in Commands

Refs work anywhere a selector is expected:

# Click
agent-browser click @e1

# Fill
agent-browser fill @e2 "user@example.com"

# Get text
agent-browser get text @e1

# Hover
agent-browser hover @e3

# Check state
agent-browser is visible @e4

Refs are parsed in three formats:

@e1 - Recommended format
ref=e1 - Alternative format
e1 - Bare format (if it matches /^e\d+$/)

// From snapshot.ts:605-615
export function parseRef(arg: string): string | null {
  if (arg.startsWith('@')) return arg.slice(1);
  if (arg.startsWith('ref=')) return arg.slice(4);
  if (/^e\d+$/.test(arg)) return arg;
  return null;
}

Ref Lifecycle

Creation

Refs are generated sequentially (e1, e2, e3, …) during snapshot generation:

// From snapshot.ts:50-64
let refCounter = 0;

function resetRefs(): void {
  refCounter = 0;
}

function nextRef(): string {
  return `e${++refCounter}`;
}

Validity

Refs are valid until:

The page navigates to a new URL
The DOM changes significantly
You explicitly get a new snapshot

Using a stale ref won’t crash - Playwright will retry the locator - but it may fail or select the wrong element if the page changed.

Best Practice

Get a fresh snapshot after:

Navigation (agent-browser open ...)
Clicking links/buttons that change the page
Waiting for dynamic content to load

# Good workflow:
agent-browser open example.com
agent-browser snapshot -i          # Get refs for page 1
agent-browser click @e1            # Click a link
agent-browser snapshot -i          # Get new refs for page 2
agent-browser fill @e2 "input"    # Use new refs

Annotated Screenshots

Visual representation of refs with numbered labels:

agent-browser screenshot --annotate

Screenshot saved to /tmp/screenshot-2026-03-02T10-30-00-abc123.png
[1] @e1 button "Submit"
[2] @e2 textbox "Email"
[3] @e3 link "Forgot password?"

The screenshot has numbered labels [1], [2], [3] overlaid on each element. The label numbers match the ref numbers (@e1 → [1]). This is useful for:

Multimodal AI models that reason about visual layout
Debugging selector issues
Documenting UI flows

JSON Mode

Get snapshots in machine-readable format:

agent-browser snapshot -i --json

{
  "success": true,
  "data": {
    "snapshot": "- button \"Submit\" [ref=e1]\n- textbox \"Email\" [ref=e2]",
    "refs": {
      "e1": {
        "selector": "getByRole('button', { name: \"Submit\", exact: true })",
        "role": "button",
        "name": "Submit"
      },
      "e2": {
        "selector": "getByRole('textbox', { name: \"Email\", exact: true })",
        "role": "textbox",
        "name": "Email"
      }
    }
  }
}

AI agents can parse this JSON to extract the snapshot tree and ref mappings.

Performance Characteristics

Snapshot generation is fast:

Small page (10-20 elements): ~50-100ms
Medium page (100-200 elements): ~200-400ms
Large page (500+ elements): ~500-1000ms

Interactive-only mode (-i) is faster because it filters earlier in the pipeline:

// From snapshot.ts:394-428
if (options.interactive) {
  // Only process interactive elements, skip the rest
  for (const line of lines) {
    if (INTERACTIVE_ROLES.has(role)) {
      result.push(line);
    }
  }
}

Comparison to Traditional Selectors

Aspect	CSS/XPath	Refs
Stability	Breaks when DOM changes	Stable within snapshot
AI-Friendliness	Hard to generate correctly	Easy - just use `@e1`
Determinism	Ambiguous if multiple matches	Always unique
Accessibility	Ignores semantic meaning	Based on ARIA/accessibility tree
Speed	Fast (direct DOM query)	Fast (cached locator)

Next Steps

Architecture - Understand the Rust CLI + Node.js daemon design
Sessions - Learn about session isolation and persistence
Selectors - Master all selector types (CSS, refs, semantic)

Get Started

Core Concepts

Commands

Security

Advanced

Integrations

Guides

Overview

The Problem with Traditional Selectors

How Refs Work

1. Get a Snapshot

2. Interact Using Refs

3. Get a New Snapshot After Changes

Accessibility Tree Source

Ref Assignment Rules

Interactive Elements (Always Get Refs)

Content Elements (Get Refs If Named)

Cursor-Interactive Elements (With `-C` Flag)

Ref Storage Format

Duplicate Handling

Snapshot Filtering Options

Interactive Only (`-i`)

Cursor-Interactive (`-C`)

Compact (`-c`)

Depth Limit (`-d N`)

Scoped Snapshots (`-s SELECTOR`)

Using Refs in Commands

Ref Lifecycle

Creation

Validity

Best Practice

Annotated Screenshots

JSON Mode

Performance Characteristics

Comparison to Traditional Selectors

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Commands

Security

Advanced

Integrations

Guides

Documentation Index

​Overview

​The Problem with Traditional Selectors

​How Refs Work

​1. Get a Snapshot

​2. Interact Using Refs

​3. Get a New Snapshot After Changes

​Accessibility Tree Source

​Ref Assignment Rules

​Interactive Elements (Always Get Refs)

​Content Elements (Get Refs If Named)

​Cursor-Interactive Elements (With -C Flag)

​Ref Storage Format

​Duplicate Handling

​Snapshot Filtering Options

​Interactive Only (-i)

​Cursor-Interactive (-C)

​Compact (-c)

​Depth Limit (-d N)

​Scoped Snapshots (-s SELECTOR)

​Using Refs in Commands

​Ref Lifecycle

​Creation

​Validity

​Best Practice

​Annotated Screenshots

​JSON Mode

​Performance Characteristics

​Comparison to Traditional Selectors

​Next Steps

Build docs developers (and LLMs) love

Overview

The Problem with Traditional Selectors

How Refs Work

1. Get a Snapshot

2. Interact Using Refs

3. Get a New Snapshot After Changes

Accessibility Tree Source

Ref Assignment Rules

Interactive Elements (Always Get Refs)

Content Elements (Get Refs If Named)

Cursor-Interactive Elements (With `-C` Flag)

Ref Storage Format

Duplicate Handling

Snapshot Filtering Options

Interactive Only (`-i`)

Cursor-Interactive (`-C`)

Compact (`-c`)

Depth Limit (`-d N`)

Scoped Snapshots (`-s SELECTOR`)

Using Refs in Commands

Ref Lifecycle

Creation

Validity

Best Practice

Annotated Screenshots

JSON Mode

Performance Characteristics

Comparison to Traditional Selectors

Next Steps