The accessibility tree is a semantic representation of a page’s content — it captures the roles, labels, and relationships that screen readers and assistive technologies use. In Handstage,Documentation Index
Fetch the complete documentation index at: https://mintlify.com/l-xiaoshen/handstage/llms.txt
Use this file to discover all available pages before exploring further.
page.snapshot() captures this tree as a structured result that is especially well-suited for AI agents: the tree encodes each interactive node with a stable ID that maps directly to an XPath selector or a URL.
Taking a snapshot
Callpage.snapshot() after navigation to capture the current accessibility tree. The method is async and returns a SnapshotResult object.
The SnapshotResult type
page.snapshot() resolves to a SnapshotResult:
| Field | Description |
|---|---|
formattedTree | A human-readable text representation of the accessibility tree. Each node is annotated with an encoded node ID. |
xpathMap | Maps encoded node IDs to absolute XPath expressions that target the corresponding DOM element. |
urlMap | Maps encoded node IDs to the href URL of link nodes. Useful for extracting navigation targets without clicking. |
Reading the formatted tree
formattedTree is a plain-text representation of the page’s semantic structure. It shows roles (button, heading, link, textbox), accessible names, and hierarchical relationships. Each interactive or named node has an encoded ID that appears inline in the tree.
Including iframes
By default,snapshot() captures only the main frame. Pass { includeIframes: true } to include the accessibility subtrees of all iframes on the page.
Including iframes increases the size of
formattedTree and the number of entries in xpathMap and urlMap. Only enable it when you need to interact with iframe content.How AI agents use snapshots
AI agents typically operate in a loop: observe the page state, decide what to do, execute an action, and repeat. Snapshots are the observation step. The formatted tree gives the model a compact, semantic view of the page without raw HTML, and the node IDs provide a reliable mechanism for translating model decisions into concrete actions. A typical agent loop looks like this:Send the tree to the model
Pass
snapshot.formattedTree to your language model along with the user’s goal. The model identifies which node ID to interact with.Look up the XPath
Use
snapshot.xpathMap to translate the node ID from the model’s response into a selector.Using XPath selectors from the snapshot
XPaths fromxpathMap are absolute — they start from the document root and are computed precisely from the DOM position of the node at snapshot time. You pass them directly to page.locator().
Using the URL map
urlMap gives you the destination URL of each link node without navigating. This is useful for filtering navigation targets, pre-fetching content, or deciding which link to follow based on its destination rather than its label.