Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/l-xiaoshen/handstage/llms.txt

Use this file to discover all available pages before exploring further.

The accessibility tree is a semantic representation of a page’s content — it captures the roles, labels, and relationships that screen readers and assistive technologies use. In Handstage, page.snapshot() captures this tree as a structured result that is especially well-suited for AI agents: the tree encodes each interactive node with a stable ID that maps directly to an XPath selector or a URL.

Taking a snapshot

Call page.snapshot() after navigation to capture the current accessibility tree. The method is async and returns a SnapshotResult object.
import { V3 } from "@handstage/core"

const browser = await V3.connectLocal()
const context = browser.context
const page = await context.newPage("https://example.com")

await page.goto("https://example.com", { waitUntil: "domcontentloaded" })

const snapshot = await page.snapshot()

console.log(snapshot.formattedTree)
console.log(snapshot.xpathMap)
console.log(snapshot.urlMap)

await browser.close()

The SnapshotResult type

page.snapshot() resolves to a SnapshotResult:
type SnapshotResult = {
  formattedTree: string
  xpathMap: Record<string, string>
  urlMap: Record<string, string>
}
FieldDescription
formattedTreeA human-readable text representation of the accessibility tree. Each node is annotated with an encoded node ID.
xpathMapMaps encoded node IDs to absolute XPath expressions that target the corresponding DOM element.
urlMapMaps encoded node IDs to the href URL of link nodes. Useful for extracting navigation targets without clicking.

Reading the formatted tree

formattedTree is a plain-text representation of the page’s semantic structure. It shows roles (button, heading, link, textbox), accessible names, and hierarchical relationships. Each interactive or named node has an encoded ID that appears inline in the tree.
document
  heading "Welcome to Example" [node-0]
  navigation
    link "Home" [node-1]
    link "About" [node-2]
    link "Contact" [node-3]
  main
    button "Sign in" [node-4]
    textbox "Email address" [node-5]

Including iframes

By default, snapshot() captures only the main frame. Pass { includeIframes: true } to include the accessibility subtrees of all iframes on the page.
const snapshot = await page.snapshot({ includeIframes: true })
Including iframes increases the size of formattedTree and the number of entries in xpathMap and urlMap. Only enable it when you need to interact with iframe content.

How AI agents use snapshots

AI agents typically operate in a loop: observe the page state, decide what to do, execute an action, and repeat. Snapshots are the observation step. The formatted tree gives the model a compact, semantic view of the page without raw HTML, and the node IDs provide a reliable mechanism for translating model decisions into concrete actions. A typical agent loop looks like this:
1

Take a snapshot

Call page.snapshot() to capture the current page state.
const snapshot = await page.snapshot()
2

Send the tree to the model

Pass snapshot.formattedTree to your language model along with the user’s goal. The model identifies which node ID to interact with.
const decision = await model.decide(snapshot.formattedTree, userGoal)
// decision = { action: "click", nodeId: "node-4" }
3

Look up the XPath

Use snapshot.xpathMap to translate the node ID from the model’s response into a selector.
const xpath = snapshot.xpathMap[decision.nodeId]
// xpath = "//button[@id='sign-in']"
4

Target the element with a locator

Pass the XPath to page.locator() and perform the action.
await page.locator(xpath).click()

Using XPath selectors from the snapshot

XPaths from xpathMap are absolute — they start from the document root and are computed precisely from the DOM position of the node at snapshot time. You pass them directly to page.locator().
const snapshot = await page.snapshot()

// Click the "Sign in" button identified by the model
const signInXpath = snapshot.xpathMap["node-4"]
await page.locator(signInXpath).click()

// Fill the email field
const emailXpath = snapshot.xpathMap["node-5"]
await page.locator(emailXpath).fill("user@example.com")
XPath selectors are often more stable than CSS selectors derived from class names or positional children, especially on dynamic pages. Prefer them when you have a choice.

Using the URL map

urlMap gives you the destination URL of each link node without navigating. This is useful for filtering navigation targets, pre-fetching content, or deciding which link to follow based on its destination rather than its label.
const snapshot = await page.snapshot()

// Find a link whose URL matches your target
const targetNodeId = Object.entries(snapshot.urlMap).find(
  ([, url]) => url.includes("/checkout")
)?.[0]

if (targetNodeId) {
  const xpath = snapshot.xpathMap[targetNodeId]
  await page.locator(xpath).click()
}

Complete example

import { V3 } from "@handstage/core"

const browser = await V3.connectLocal()
const context = browser.context
const page = await context.newPage()

await page.goto("https://example.com/login", { waitUntil: "domcontentloaded" })

const snapshot = await page.snapshot()

// The model identifies the email field node
const emailNodeId = "node-5"
const emailXpath = snapshot.xpathMap[emailNodeId]
await page.locator(emailXpath).fill("user@example.com")

// The model identifies the password field node
const passwordNodeId = "node-6"
const passwordXpath = snapshot.xpathMap[passwordNodeId]
await page.locator(passwordXpath).fill("s3cr3t")

// The model identifies the submit button node
const submitNodeId = "node-7"
const submitXpath = snapshot.xpathMap[submitNodeId]
await page.locator(submitXpath).click()

await page.waitForLoadState("domcontentloaded")

// Take a new snapshot to observe the result
const resultSnapshot = await page.snapshot()
console.log(resultSnapshot.formattedTree)

await browser.close()

Build docs developers (and LLMs) love