Overview
Playwriter’s visual labels provide a Vimium-style overlay system that assigns short alphanumeric codes to interactive elements on the page. This makes it easy for AI agents (and humans) to reference specific elements without complex selectors.
How visual labels work
When you call screenshotWithAccessibilityLabels(), Playwriter:
- Analyzes the page’s accessibility tree
- Identifies all interactive elements (buttons, links, inputs, etc.)
- Generates short, unique labels (e.g., “e5”, “e12”, “e23”)
- Overlays these labels on a screenshot
- Returns both the labeled screenshot and a text snapshot with
aria-ref selectors
Taking a labeled screenshot
// Navigate to page
state.page = context.pages().find((p) => p.url() === 'about:blank') ?? (await context.newPage())
await state.page.goto('https://example.com', { waitUntil: 'domcontentloaded' })
// Get screenshot with labels
const result = await screenshotWithAccessibilityLabels({ page: state.page })
// Returns:
// - screenshot (Buffer): PNG image with labels overlaid
// - snapshot (string): Text representation with aria-ref selectors
The function returns both a visual screenshot with labels AND a text snapshot containing the same aria-ref identifiers.
Using aria-ref selectors
Once you have labels, use them with the aria-ref selector:
// Take labeled screenshot to see labels
await screenshotWithAccessibilityLabels({ page: state.page })
// Returns: ... button "Sign In" [aria-ref=e5] ...
// Click the element with label e5
await state.page.locator('aria-ref=e5').click()
Color coding system
Labels are color-coded by element type to provide visual context:
| Color | Element Type | Example |
|---|
| Yellow | Links | Navigation links, anchor tags |
| Orange | Buttons | Submit buttons, action buttons |
| Coral | Text inputs | Email, password, search fields |
| Pink | Checkboxes | Selection boxes, toggles |
| Peach | Sliders | Range inputs, volume controls |
| Salmon | Menus | Dropdowns, context menus |
| Amber | Tabs | Tab navigation, tab panels |
Complete workflow example
Navigate to page
state.page = context.pages().find((p) => p.url() === 'about:blank') ?? (await context.newPage())
await state.page.goto('https://github.com/login', { waitUntil: 'domcontentloaded' })
Take labeled screenshot
const result = await screenshotWithAccessibilityLabels({ page: state.page })
// Visual screenshot shows:
// - "Username" input with coral label [e5]
// - "Password" input with coral label [e7]
// - "Sign in" button with orange label [e12]
Use labels to interact
// Fill username (coral label e5)
await state.page.locator('aria-ref=e5').fill('myusername')
// Fill password (coral label e7)
await state.page.locator('aria-ref=e7').fill('mypassword')
// Click sign in (orange label e12)
await state.page.locator('aria-ref=e12').click()
Verify result
await state.page.waitForLoadState('domcontentloaded')
console.log('Current URL:', state.page.url())
await snapshot({ page: state.page })
When to use visual labels
Use visual labels when:
- Visual layout matters — need to understand spatial relationships
- Complex UIs — many similar elements that are hard to distinguish by text alone
- Debugging interactions — want to verify you’re clicking the right element
- Visual verification needed — need to confirm button colors, positions, visibility
// Good use case: distinguishing between multiple buttons
await screenshotWithAccessibilityLabels({ page: state.page })
// Shows three "Delete" buttons with different positions and labels
// Can now click the specific one by its label
await state.page.locator('aria-ref=e15').click()
Use text snapshots when:
- Reading content — extracting text, checking if elements exist
- Fast feedback — no need to analyze images
- Token efficiency — text snapshots are 5-20KB vs screenshots at 100KB+
- Element finding — searching for specific text or roles
// Good use case: checking if login was successful
await snapshot({ page: state.page, search: /welcome|dashboard/i })
// Fast, cheap, no image tokens needed
Always try snapshot() first before using screenshotWithAccessibilityLabels(). Screenshots are more expensive in both time and tokens.
Programmatic label management
You can also manage labels programmatically without screenshots:
Show labels on page
// Add labels to page (without taking screenshot)
await showAriaRefLabels({ page: state.page })
// Labels now visible in the browser
// Useful for debugging while watching the browser
Hide labels
// Remove labels from page
await hideAriaRefLabels({ page: state.page })
Get snapshot with aria-ref
Get just the text snapshot with aria-ref selectors (no screenshot):
const result = await getAriaSnapshot(state.page)
console.log(result.snapshot) // Text with aria-ref selectors
// Find element by parsing the snapshot text
Advanced usage
Scoped screenshots
Take a labeled screenshot of a specific region:
// Take snapshot of a specific area
const modal = state.page.locator('.modal-dialog')
const result = await screenshotWithAccessibilityLabels({
page: state.page,
locator: modal
})
// Shows only labels within the modal
Labels in iframes
Visual labels work in iframes too:
const frame = state.page.frames().find((f) => f.url().includes('widget.example.com'))
if (frame) {
// Get labeled screenshot of iframe content
const result = await screenshotWithAccessibilityLabels({
page: state.page,
frame
})
// Use aria-ref within the frame
await frame.locator('aria-ref=e8').click()
}
Combining with search
Get a labeled screenshot focused on specific elements:
// Take screenshot and also get filtered snapshot
const result = await screenshotWithAccessibilityLabels({ page: state.page })
// Also get text snapshot with search filter
const filtered = await snapshot({
page: state.page,
search: /submit|send|post/i
})
// Shows only elements matching the search pattern
Label lifecycle
Labels are stable within a single page state but change when:
- Page navigation — labels reset on new page load
- DOM changes — dynamic content may get new labels
- Re-taking screenshot — labels may be reassigned
Always take a fresh labeled screenshot before clicking if the page has changed. Stale labels may point to wrong elements.
// ❌ Bad: using old labels after page change
await screenshotWithAccessibilityLabels({ page: state.page })
// ... page content changes ...
await state.page.locator('aria-ref=e5').click() // May click wrong element!
// ✅ Good: fresh screenshot after changes
await screenshotWithAccessibilityLabels({ page: state.page })
await state.page.locator('aria-ref=e5').click()
// ... page changes ...
await screenshotWithAccessibilityLabels({ page: state.page }) // Fresh labels
await state.page.locator('aria-ref=e12').click() // Correct element
Common use cases
// Page has multiple "Delete" buttons
await screenshotWithAccessibilityLabels({ page: state.page })
// Screenshot shows:
// - "Delete" [e5] - next to item 1
// - "Delete" [e8] - next to item 2
// - "Delete" [e11] - next to item 3
// Click the specific one
await state.page.locator('aria-ref=e8').click()
Verifying element visibility
// Check if sidebar toggle is visible
await screenshotWithAccessibilityLabels({ page: state.page })
// Look at screenshot to verify button is visible and accessible
// If label appears, element is visible and interactive
// Large form with many inputs
await screenshotWithAccessibilityLabels({ page: state.page })
// Screenshot shows all fields with labels
// Fill form using labels:
await state.page.locator('aria-ref=e5').fill('John')
await state.page.locator('aria-ref=e7').fill('Doe')
await state.page.locator('aria-ref=e9').fill('john@example.com')
await state.page.locator('aria-ref=e15').click() // Submit
Best practices
- Try text snapshot first — cheaper and faster than screenshots
- Use for visual verification — when layout and positioning matter
- Retake after page changes — don’t reuse stale labels
- Color coding helps — use colors to identify element types quickly
- Combine with text snapshots — use screenshot for layout, text for content
- Clean up afterward — hide labels with
hideAriaRefLabels() if shown programmatically
- Handle label not found — elements may appear/disappear dynamically