Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/l-xiaoshen/handstage/llms.txt

Use this file to discover all available pages before exploring further.

The @handstage/agent package provides a ready-made set of browser-control tools designed for LLM function-calling. You supply the tool definitions to your AI SDK, implement the handlers that talk to a real browser, and the model decides when and how to call each tool. This pattern keeps the AI layer and the browser layer loosely coupled — you can swap out either side independently.

Installation

npm install @handstage/agent @handstage/core

Key exports

ExportDescription
handstagesAgentToolsPre-built ToolSet compatible with the Vercel AI SDK. Pass directly to generateText or streamText.
createHandstagesAgentToolDefinitions()Function that returns the same ToolSet. Useful when you need to call it lazily.
HandstagesAgentToolHandlersTypeScript interface listing one method per tool. Implement this to connect tool calls to a real browser.
HandstagesAgentContextInterface describing the browser context your handlers operate on (pages, activePage, setActivePage, newPage).
HandstagesAgentNamespace of inferred input/output types for every tool (e.g. HandstagesAgent.GotoInput).

Available tools

Returns every open tab with its pageId, url, title, and whether it is the active (foreground) tab.Input: noneOutput: { pages: Array<{ pageId, url, title, activated }> }
Opens a new browser tab and returns its pageId. You can optionally pass a starting URL.Input: { url?: string } — defaults to "about:blank"Output: { pageId: string }
Brings a tab to the foreground by pageId. Use after newPage or when the model needs to switch context.Input: { pageId: string }Output: { ok: true } | { ok: false; error: string }
Navigates a tab to the given URL and optionally waits for a lifecycle event.Input: { pageId, url, waitUntil?, timeoutMs? }waitUntil accepts "load", "domcontentloaded", or "networkidle".Output: { ok: true; url: string } | { ok: false; error: string }
Reloads the page in the given tab.Input: { pageId, waitUntil?, timeoutMs?, ignoreCache? }Output: { ok: true; url: string } | { ok: false; error: string }
Goes back one step in the tab’s navigation history if possible.Input: { pageId, waitUntil?, timeoutMs? }Output: { ok: true; navigated: boolean; url: string } | { ok: false; error: string }
Goes forward one step in the tab’s navigation history if possible.Input: { pageId, waitUntil?, timeoutMs? }Output: { ok: true; navigated: boolean; url: string } | { ok: false; error: string }
Returns a multi-line accessibility tree for the tab. Call this before interacting to understand what is on screen. The output includes an xpathMap (node id → XPath) and a urlMap (node id → href) for use with selector-based tools.Input: { pageId, includeIframes? }Output: { ok: true; tree: string; xpathMap: Record<string, string>; urlMap: Record<string, string> } | { ok: false; error: string }
Returns the current URL and document title for a tab without rendering the full accessibility tree.Input: { pageId }Output: { ok: true; url: string; title: string } | { ok: false; error: string }
Clicks at a specific point in CSS pixels. The target element must already be visible in the viewport.Input: { pageId, x, y, button?, clickCount? }button accepts "left", "right", or "middle".Output: { ok: true; xpathAtPoint?: string } | { ok: false; error: string }
Moves the mouse cursor to specific viewport coordinates without clicking.Input: { pageId, x, y }Output: { ok: true; xpathAtPoint?: string } | { ok: false; error: string }
Dispatches a mouse wheel event at the given viewport coordinates with deltaX and deltaY in pixels.Input: { pageId, x, y, deltaX, deltaY }Output: { ok: true; xpathAtPoint?: string } | { ok: false; error: string }
Types text using key events at whatever element currently has focus. Focus an input first (e.g. with click_on) before calling this.Input: { pageId, text, delay?, withMistakes? }Output: { ok: true } | { ok: false; error: string }
Clicks the first element in the page’s main frame that matches a CSS selector or XPath expression.Input: { pageId, select } where select is a CSS selector or XPath (e.g. //button[@id='submit']).Output: { ok: true } | { ok: false; error: string }
Clears and fills an input element matched by a CSS selector or XPath in the main frame.Input: { pageId, select, value }Output: { ok: true } | { ok: false; error: string }
Types text into an element matched by a CSS selector or XPath, focusing it first.Input: { pageId, select, text, delay? }Output: { ok: true } | { ok: false; error: string }
Moves the pointer over the first element in the main frame that matches a CSS selector or XPath.Input: { pageId, select }Output: { ok: true } | { ok: false; error: string }

Wiring up with the Vercel AI SDK

1

Connect to Chrome

Start a browser with V3.connectLocal().
import { V3 } from "@handstage/core"

const handstage = await V3.connectLocal({ localBrowserLaunchOptions: { headless: true } })
const context = handstage.context
2

Implement HandstagesAgentToolHandlers

Create an object that satisfies the HandstagesAgentToolHandlers interface. Each method receives the tool input from the AI model and must return the typed output.
import type {
  HandstagesAgentToolHandlers,
  HandstagesAgentContext,
  HandstagesAgent,
} from "@handstage/agent"

function createHandlers(ctx: HandstagesAgentContext): HandstagesAgentToolHandlers {
  return {
    async pages(_input) {
      const allPages = ctx.pages()
      const active = ctx.activePage()
      return {
        pages: allPages.map((p) => ({
          pageId: p.targetId(),
          url: p.url(),
          title: p.title(),
          activated: p === active,
        })),
      }
    },

    async newPage(input) {
      const page = await ctx.newPage(input.url)
      return { pageId: page.targetId() }
    },

    async setActivePage(input) {
      const page = ctx.pages().find((p) => p.targetId() === input.pageId)
      if (!page) return { ok: false, error: `Page ${input.pageId} not found` }
      ctx.setActivePage(page)
      return { ok: true }
    },

    async goto(input) {
      const page = ctx.pages().find((p) => p.targetId() === input.pageId)
      if (!page) return { ok: false, error: `Page ${input.pageId} not found` }
      try {
        await page.goto(input.url)
        return { ok: true, url: page.url() }
      } catch (err) {
        return { ok: false, error: String(err) }
      }
    },

    // ... implement remaining handlers
  }
}
3

Pass tools to the AI SDK

Import handstagesAgentTools and supply it to generateText or streamText alongside your handlers.
import { generateText } from "ai"
import { openai } from "@ai-sdk/openai"
import { handstagesAgentTools } from "@handstage/agent"

const handlers = createHandlers(context)

const result = await generateText({
  model: openai("gpt-4o"),
  tools: handstagesAgentTools,
  toolChoice: "auto",
  maxSteps: 20,
  system: "You are a browser agent. Use the tools to complete tasks on the web.",
  prompt: "Go to example.com and tell me what the main heading says.",
  execute: async (toolName, input) => {
    const handler = handlers[toolName as keyof typeof handlers]
    return handler(input as never)
  },
})

console.log(result.text)
await handstage.close()

Using the function form

createHandstagesAgentToolDefinitions() returns the same tool set as handstagesAgentTools. Use it when you need to call a function rather than reference a constant:
import { createHandstagesAgentToolDefinitions } from "@handstage/agent"

const tools = createHandstagesAgentToolDefinitions()
// tools is identical to handstagesAgentTools

Type-safe tool inputs and outputs

The HandstagesAgent namespace exports inferred input and output types for every tool. Use these when implementing handlers or when processing tool results in your own code:
import type { HandstagesAgent } from "@handstage/agent"

function handleGoto(input: HandstagesAgent.GotoInput): Promise<HandstagesAgent.GotoOutput> {
  // input.pageId, input.url, input.waitUntil, input.timeoutMs are all typed
  return Promise.resolve({ ok: true, url: input.url })
}
Available types follow the pattern HandstagesAgent.<ToolName>Input and HandstagesAgent.<ToolName>Output for every tool listed in the available tools section above.

Build docs developers (and LLMs) love