Overview
Snapshot refs provide deterministic element selection for AI agents and automation scripts. Instead of writing brittle CSS selectors or XPath queries, you:- Get an accessibility tree snapshot with numbered refs (
@e1,@e2, etc.) - Use those refs to interact with elements
- Get a new snapshot when the page changes
The Problem with Traditional Selectors
Traditional selectors have issues for automation:How Refs Work
1. Get a Snapshot
- ARIA roles (heading, link, button, textbox, etc.)
- Accessible names (the text shown to screen readers)
- Refs (
@e1,@e2) for interactive or named elements - Attributes (level, checked, etc.)
2. Interact Using Refs
3. Get a New Snapshot After Changes
When the page changes (navigation, dynamic content), get a fresh snapshot:Accessibility Tree Source
Snapshots are built from the browser’s accessibility tree - the same structure used by screen readers:ariaSnapshot() returns a text representation like:
Ref Assignment Rules
Interactive Elements (Always Get Refs)
Elements with interactive ARIA roles automatically get refs:Content Elements (Get Refs If Named)
Elements that provide context get refs only if they have a name:Cursor-Interactive Elements (With -C Flag)
The --cursor flag finds elements that don’t have proper ARIA roles but are visually interactive:
cursor: pointerCSS propertyonclickevent handlerstabindexattribute (except-1)
<div> and <span> as buttons instead of semantic HTML.
Ref Storage Format
Refs are stored in a map that tracks how to locate each element:Duplicate Handling
When multiple elements have the same role and name, refs include annth index:
nth field tells Playwright which instance to select:
Snapshot Filtering Options
Interactive Only (-i)
Show only interactive elements (buttons, links, inputs):
Cursor-Interactive (-C)
Include elements with cursor:pointer or click handlers:
<div> elements.
Compact (-c)
Remove empty structural elements:
Depth Limit (-d N)
Limit tree depth to N levels:
Scoped Snapshots (-s SELECTOR)
Limit snapshot to a CSS selector:
#main appear in the snapshot.
Using Refs in Commands
Refs work anywhere a selector is expected:@e1- Recommended formatref=e1- Alternative formate1- Bare format (if it matches/^e\d+$/)
Ref Lifecycle
Creation
Refs are generated sequentially (e1, e2, e3, …) during snapshot generation:
Validity
Refs are valid until:- The page navigates to a new URL
- The DOM changes significantly
- You explicitly get a new snapshot
Best Practice
Get a fresh snapshot after:- Navigation (
agent-browser open ...) - Clicking links/buttons that change the page
- Waiting for dynamic content to load
Annotated Screenshots
Visual representation of refs with numbered labels:[1], [2], [3] overlaid on each element. The label numbers match the ref numbers (@e1 → [1]).
This is useful for:
- Multimodal AI models that reason about visual layout
- Debugging selector issues
- Documenting UI flows
JSON Mode
Get snapshots in machine-readable format:Performance Characteristics
Snapshot generation is fast:- Small page (10-20 elements): ~50-100ms
- Medium page (100-200 elements): ~200-400ms
- Large page (500+ elements): ~500-1000ms
-i) is faster because it filters earlier in the pipeline:
Comparison to Traditional Selectors
| Aspect | CSS/XPath | Refs |
|---|---|---|
| Stability | Breaks when DOM changes | Stable within snapshot |
| AI-Friendliness | Hard to generate correctly | Easy - just use @e1 |
| Determinism | Ambiguous if multiple matches | Always unique |
| Accessibility | Ignores semantic meaning | Based on ARIA/accessibility tree |
| Speed | Fast (direct DOM query) | Fast (cached locator) |
Next Steps
- Architecture - Understand the Rust CLI + Node.js daemon design
- Sessions - Learn about session isolation and persistence
- Selectors - Master all selector types (CSS, refs, semantic)