Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/l-xiaoshen/handstage/llms.txt

Use this file to discover all available pages before exploring further.

The @handstage/agent package ships a ready-made ToolSet that you can pass directly to the Vercel AI SDK. Each tool carries a description, a Zod input schema, and a Zod output schema — everything the AI SDK needs to present the tool to a language model and validate results. You supply the execution layer separately by implementing HandstagesAgentToolHandlers.

Exports

handstagesAgentTools

A ToolSet object (from the Vercel AI SDK’s ai package) that contains all 17 browser tools. Import it and pass it straight to generateText or streamText.
import { handstagesAgentTools } from "@handstage/agent"

createHandstagesAgentToolDefinitions()

A function that returns the same ToolSet object as handstagesAgentTools. Use this form when you need a fresh reference on each call or prefer a function-based API.
import { createHandstagesAgentToolDefinitions } from "@handstage/agent"

const tools = createHandstagesAgentToolDefinitions()

Usage with the Vercel AI SDK

Pass handstagesAgentTools to generateText or streamText. Wire execution by providing each tool’s execute function via your HandstagesAgentToolHandlers implementation. See HandstagesAgentToolHandlers reference for the full wiring pattern.
import { generateText } from "ai"
import { openai } from "@ai-sdk/openai"
import { handstagesAgentTools } from "@handstage/agent"

const result = await generateText({
  model: openai("gpt-4o"),
  tools: handstagesAgentTools,
  maxSteps: 20,
  prompt: "Go to https://example.com and tell me the page title.",
})
For streaming:
import { streamText } from "ai"
import { openai } from "@ai-sdk/openai"
import { handstagesAgentTools } from "@handstage/agent"

const stream = streamText({
  model: openai("gpt-4o"),
  tools: handstagesAgentTools,
  maxSteps: 20,
  prompt: "Search for TypeScript tutorials on https://example.com",
})

for await (const chunk of stream.textStream) {
  process.stdout.write(chunk)
}
handstagesAgentTools contains only the schema and description for each tool — no execution logic. You must attach execution via HandstagesAgentToolHandlers. See the handlers reference for a complete wiring example.

Tool reference

All 17 tools are listed below. Each tool entry shows its input parameters and output shape.
Returns all currently open tabs in the browser context.Input: none — this tool takes no parameters.Output
pages
object[]
required
Array of open tab entries.
Opens a new tab. Returns the new tab’s pageId. If you provide a URL, the tab navigates to it immediately; otherwise it opens about:blank.Input
url
string
Initial URL for the new tab. Defaults to "about:blank" when omitted.
Output
pageId
string
required
Unique identifier for the newly opened tab.
Brings a tab to the foreground by its pageId. Use this after newPage or any time you switch context between tabs.Input
pageId
string
required
Target tab ID returned by pages or newPage.
Output
ok
true
Returned when the tab was successfully focused.
ok
false
Returned when the operation failed.
error
string
Human-readable error message. Only present when ok is false.
Navigates a tab to the given URL. You can control the wait condition and a timeout.Input
pageId
string
required
Target tab ID.
url
string
required
URL to navigate to. Must be non-empty.
waitUntil
"load" | "domcontentloaded" | "networkidle"
Lifecycle event to wait for before the tool returns. Defaults to browser default when omitted.
timeoutMs
number
Maximum time to wait in milliseconds. Must be a positive number.
Output
ok
true
Navigation succeeded.
url
string
Final URL after navigation (may differ from input due to redirects). Only present when ok is true.
ok
false
Navigation failed.
error
string
Error description. Only present when ok is false.
Reloads the document in a tab. You can optionally bypass the cache.Input
pageId
string
required
Target tab ID.
waitUntil
"load" | "domcontentloaded" | "networkidle"
Lifecycle event to wait for before the tool returns.
timeoutMs
number
Maximum wait time in milliseconds.
ignoreCache
boolean
When true, performs a hard reload that bypasses the browser cache.
Output
ok
true
Reload succeeded.
url
string
URL of the page after reload. Only present when ok is true.
ok
false
Reload failed.
error
string
Error description. Only present when ok is false.
Navigates back in the tab’s session history, if a previous entry exists. The navigated field in the response tells you whether a back navigation actually occurred.Input
pageId
string
required
Target tab ID.
waitUntil
"load" | "domcontentloaded" | "networkidle"
Lifecycle event to wait for after navigation.
timeoutMs
number
Maximum wait time in milliseconds.
Output
ok
true
Operation completed (the tab may or may not have navigated).
navigated
boolean
true if history navigation occurred; false if there was no previous entry. Only present when ok is true.
url
string
Current URL after the operation. Only present when ok is true.
ok
false
Operation failed.
error
string
Error description. Only present when ok is false.
Navigates forward in the tab’s session history, if a forward entry exists. Identical input and output shape to goBack.Input
pageId
string
required
Target tab ID.
waitUntil
"load" | "domcontentloaded" | "networkidle"
Lifecycle event to wait for after navigation.
timeoutMs
number
Maximum wait time in milliseconds.
Output
ok
true
Operation completed.
navigated
boolean
true if a forward entry was available and navigation occurred. Only present when ok is true.
url
string
Current URL after the operation. Only present when ok is true.
ok
false
Operation failed.
error
string
Error description. Only present when ok is false.
Returns the accessibility tree for a tab as a structured text string. Call this before interacting with a page so the model can understand what is on screen. The response includes an xpathMap that maps encoded node IDs in the tree to their XPath selectors, and a urlMap for link URLs.Input
pageId
string
required
Target tab ID.
includeIframes
boolean
When true, includes accessibility nodes from embedded iframes. Defaults to false.
Output
ok
true
Snapshot captured successfully.
tree
string
Multiline accessibility tree text with encoded node IDs. Only present when ok is true.
xpathMap
Record<string, string>
Maps encoded node IDs from tree to their XPath selectors. Use these selectors with click_on, fill_on, type_on, or hover_on. Only present when ok is true.
urlMap
Record<string, string>
Maps encoded node IDs to their link href values where applicable. Only present when ok is true.
ok
false
Snapshot failed.
error
string
Error description. Only present when ok is false.
Returns the current URL and document title for a tab. Useful for verifying navigation results without taking a full snapshot.Input
pageId
string
required
Target tab ID.
Output
ok
true
Info retrieved successfully.
url
string
Current URL of the tab. Only present when ok is true.
title
string
Current document title of the tab. Only present when ok is true.
ok
false
Operation failed.
error
string
Error description. Only present when ok is false.
Dispatches a mouse click at the given CSS pixel coordinates. The target element must already be visible in the viewport — this tool does not scroll. Use snapshot to retrieve coordinates from the accessibility tree, or use click_on to target by selector.Input
pageId
string
required
Target tab ID.
x
number
required
Horizontal coordinate in CSS pixels from the left edge of the viewport.
y
number
required
Vertical coordinate in CSS pixels from the top edge of the viewport.
button
"left" | "right" | "middle"
Mouse button to click. Defaults to "left".
clickCount
number
Number of clicks (e.g., 2 for double-click). Must be a positive integer.
Output
ok
true
Click dispatched.
xpathAtPoint
string
XPath of the element at the clicked coordinates, if available. Only present when ok is true.
ok
false
Click failed.
error
string
Error description. Only present when ok is false.
Moves the mouse pointer to the given CSS pixel coordinates without clicking. Useful for triggering hover states, tooltips, or dropdown menus.Input
pageId
string
required
Target tab ID.
x
number
required
Horizontal coordinate in CSS pixels.
y
number
required
Vertical coordinate in CSS pixels.
Output
ok
true
Pointer moved.
xpathAtPoint
string
XPath of the element at the pointer position, if available. Only present when ok is true.
ok
false
Operation failed.
error
string
Error description. Only present when ok is false.
Dispatches a mouse wheel (scroll) event at the given coordinates. Use positive deltaY to scroll down and negative deltaY to scroll up.Input
pageId
string
required
Target tab ID.
x
number
required
Horizontal coordinate of the wheel event in CSS pixels.
y
number
required
Vertical coordinate of the wheel event in CSS pixels.
deltaX
number
required
Horizontal scroll delta in pixels. Positive scrolls right.
deltaY
number
required
Vertical scroll delta in pixels. Positive scrolls down.
Output
ok
true
Scroll event dispatched.
xpathAtPoint
string
XPath of the element at the scroll position, if available. Only present when ok is true.
ok
false
Operation failed.
error
string
Error description. Only present when ok is false.
Types text using key events at the currently focused element. Focus a target input first (e.g., with click_on) before calling this tool. For typing directly into a selector, use type_on instead.Input
pageId
string
required
Target tab ID.
text
string
required
Text to type.
delay
number
Delay in milliseconds between keystrokes. Must be non-negative. Omit for no delay.
withMistakes
boolean
When true, simulates realistic typing with occasional mistakes and corrections.
Output
ok
true
Text typed successfully.
ok
false
Typing failed.
error
string
Error description. Only present when ok is false.
Clicks the first element in the page’s main frame that matches a CSS selector or XPath expression. This is the preferred way to click elements identified from a snapshot.Input
pageId
string
required
Target tab ID.
select
string
required
CSS selector or XPath expression (e.g., //button[@id='submit']). Must be non-empty.
Output
ok
true
Element found and clicked.
ok
false
Element not found or click failed.
error
string
Error description. Only present when ok is false.
Clears the current value of an input element matched by a CSS selector or XPath, then sets it to the given value. Prefer this over type_on when you want to replace the entire field value atomically.Input
pageId
string
required
Target tab ID.
select
string
required
CSS selector or XPath expression targeting the input element.
value
string
required
New value to set on the input.
Output
ok
true
Input found and filled.
ok
false
Element not found or fill failed.
error
string
Error description. Only present when ok is false.
Focuses the element matched by the selector, then types text into it using key events. Unlike fill_on, this preserves any existing value and dispatches real key events, making it suitable for inputs that respond to keystroke-level events.Input
pageId
string
required
Target tab ID.
select
string
required
CSS selector or XPath expression targeting the element.
text
string
required
Text to type into the element.
delay
number
Delay in milliseconds between keystrokes. Must be non-negative.
Output
ok
true
Element found and text typed.
ok
false
Element not found or typing failed.
error
string
Error description. Only present when ok is false.
Moves the pointer to the first element matched by a CSS selector or XPath in the page’s main frame. Use this to trigger hover states, open tooltips, or reveal dropdown menus.Input
pageId
string
required
Target tab ID.
select
string
required
CSS selector or XPath expression targeting the element.
Output
ok
true
Pointer moved to element.
ok
false
Element not found or hover failed.
error
string
Error description. Only present when ok is false.

Build docs developers (and LLMs) love