Build AI browser agents with Handstage

The @handstage/agent package provides a ready-made set of browser-control tools designed for LLM function-calling. You supply the tool definitions to your AI SDK, implement the handlers that talk to a real browser, and the model decides when and how to call each tool. This pattern keeps the AI layer and the browser layer loosely coupled — you can swap out either side independently.

Installation

npm install @handstage/agent @handstage/core

Key exports

Export	Description
`handstagesAgentTools`	Pre-built `ToolSet` compatible with the Vercel AI SDK. Pass directly to `generateText` or `streamText`.
`createHandstagesAgentToolDefinitions()`	Function that returns the same `ToolSet`. Useful when you need to call it lazily.
`HandstagesAgentToolHandlers`	TypeScript interface listing one method per tool. Implement this to connect tool calls to a real browser.
`HandstagesAgentContext`	Interface describing the browser context your handlers operate on (`pages`, `activePage`, `setActivePage`, `newPage`).
`HandstagesAgent`	Namespace of inferred input/output types for every tool (e.g. `HandstagesAgent.GotoInput`).

Available tools

pages — list open tabs

Returns every open tab with its pageId, url, title, and whether it is the active (foreground) tab.Input: noneOutput: { pages: Array<{ pageId, url, title, activated }> }

newPage — open a new tab

Opens a new browser tab and returns its pageId. You can optionally pass a starting URL.Input: { url?: string } — defaults to "about:blank"Output: { pageId: string }

setActivePage — focus a tab

Brings a tab to the foreground by pageId. Use after newPage or when the model needs to switch context.Input: { pageId: string }Output: { ok: true } | { ok: false; error: string }

goto — navigate to a URL

Navigates a tab to the given URL and optionally waits for a lifecycle event.Input: { pageId, url, waitUntil?, timeoutMs? }waitUntil accepts "load", "domcontentloaded", or "networkidle".Output: { ok: true; url: string } | { ok: false; error: string }

reload — reload the current document

Reloads the page in the given tab.Input: { pageId, waitUntil?, timeoutMs?, ignoreCache? }Output: { ok: true; url: string } | { ok: false; error: string }

goBack — navigate back in history

Goes back one step in the tab’s navigation history if possible.Input: { pageId, waitUntil?, timeoutMs? }Output: { ok: true; navigated: boolean; url: string } | { ok: false; error: string }

goForward — navigate forward in history

Goes forward one step in the tab’s navigation history if possible.Input: { pageId, waitUntil?, timeoutMs? }Output: { ok: true; navigated: boolean; url: string } | { ok: false; error: string }

snapshot — accessibility tree

Returns a multi-line accessibility tree for the tab. Call this before interacting to understand what is on screen. The output includes an xpathMap (node id → XPath) and a urlMap (node id → href) for use with selector-based tools.Input: { pageId, includeIframes? }Output:

{ ok: true; tree: string; xpathMap: Record<string, string>; urlMap: Record<string, string> } | { ok: false; error: string }

pageInfo — current URL and title

Returns the current URL and document title for a tab without rendering the full accessibility tree.Input: { pageId }Output: { ok: true; url: string; title: string } | { ok: false; error: string }

click — click at viewport coordinates

Clicks at a specific point in CSS pixels. The target element must already be visible in the viewport.Input: { pageId, x, y, button?, clickCount? }button accepts "left", "right", or "middle".Output: { ok: true; xpathAtPoint?: string } | { ok: false; error: string }

hover — move the pointer

Moves the mouse cursor to specific viewport coordinates without clicking.Input: { pageId, x, y }Output: { ok: true; xpathAtPoint?: string } | { ok: false; error: string }

scroll — dispatch a wheel event

Dispatches a mouse wheel event at the given viewport coordinates with deltaX and deltaY in pixels.Input: { pageId, x, y, deltaX, deltaY }Output: { ok: true; xpathAtPoint?: string } | { ok: false; error: string }

type — type text at the current focus

Types text using key events at whatever element currently has focus. Focus an input first (e.g. with click_on) before calling this.Input: { pageId, text, delay?, withMistakes? }Output: { ok: true } | { ok: false; error: string }

click_on — click by selector

Clicks the first element in the page’s main frame that matches a CSS selector or XPath expression.Input: { pageId, select } where select is a CSS selector or XPath (e.g. //button[@id='submit']).Output: { ok: true } | { ok: false; error: string }

fill_on — fill an input by selector

Clears and fills an input element matched by a CSS selector or XPath in the main frame.Input: { pageId, select, value }Output: { ok: true } | { ok: false; error: string }

type_on — type into an element by selector

Types text into an element matched by a CSS selector or XPath, focusing it first.Input: { pageId, select, text, delay? }Output: { ok: true } | { ok: false; error: string }

hover_on — hover over an element by selector

Moves the pointer over the first element in the main frame that matches a CSS selector or XPath.Input: { pageId, select }Output: { ok: true } | { ok: false; error: string }

Wiring up with the Vercel AI SDK

Connect to Chrome

Start a browser with V3.connectLocal().

import { V3 } from "@handstage/core"

const handstage = await V3.connectLocal({ localBrowserLaunchOptions: { headless: true } })
const context = handstage.context

Implement HandstagesAgentToolHandlers

Create an object that satisfies the HandstagesAgentToolHandlers interface. Each method receives the tool input from the AI model and must return the typed output.

import type {
  HandstagesAgentToolHandlers,
  HandstagesAgentContext,
  HandstagesAgent,
} from "@handstage/agent"

function createHandlers(ctx: HandstagesAgentContext): HandstagesAgentToolHandlers {
  return {
    async pages(_input) {
      const allPages = ctx.pages()
      const active = ctx.activePage()
      return {
        pages: allPages.map((p) => ({
          pageId: p.targetId(),
          url: p.url(),
          title: p.title(),
          activated: p === active,
        })),
      }
    },

    async newPage(input) {
      const page = await ctx.newPage(input.url)
      return { pageId: page.targetId() }
    },

    async setActivePage(input) {
      const page = ctx.pages().find((p) => p.targetId() === input.pageId)
      if (!page) return { ok: false, error: `Page ${input.pageId} not found` }
      ctx.setActivePage(page)
      return { ok: true }
    },

    async goto(input) {
      const page = ctx.pages().find((p) => p.targetId() === input.pageId)
      if (!page) return { ok: false, error: `Page ${input.pageId} not found` }
      try {
        await page.goto(input.url)
        return { ok: true, url: page.url() }
      } catch (err) {
        return { ok: false, error: String(err) }
      }
    },

    // ... implement remaining handlers
  }
}

Pass tools to the AI SDK

Import handstagesAgentTools and supply it to generateText or streamText alongside your handlers.

import { generateText } from "ai"
import { openai } from "@ai-sdk/openai"
import { handstagesAgentTools } from "@handstage/agent"

const handlers = createHandlers(context)

const result = await generateText({
  model: openai("gpt-4o"),
  tools: handstagesAgentTools,
  toolChoice: "auto",
  maxSteps: 20,
  system: "You are a browser agent. Use the tools to complete tasks on the web.",
  prompt: "Go to example.com and tell me what the main heading says.",
  execute: async (toolName, input) => {
    const handler = handlers[toolName as keyof typeof handlers]
    return handler(input as never)
  },
})

console.log(result.text)
await handstage.close()

Using the function form

createHandstagesAgentToolDefinitions() returns the same tool set as handstagesAgentTools. Use it when you need to call a function rather than reference a constant:

import { createHandstagesAgentToolDefinitions } from "@handstage/agent"

const tools = createHandstagesAgentToolDefinitions()
// tools is identical to handstagesAgentTools

Type-safe tool inputs and outputs

The HandstagesAgent namespace exports inferred input and output types for every tool. Use these when implementing handlers or when processing tool results in your own code:

import type { HandstagesAgent } from "@handstage/agent"

function handleGoto(input: HandstagesAgent.GotoInput): Promise<HandstagesAgent.GotoOutput> {
  // input.pageId, input.url, input.waitUntil, input.timeoutMs are all typed
  return Promise.resolve({ ok: true, url: input.url })
}

Available types follow the pattern HandstagesAgent.<ToolName>Input and HandstagesAgent.<ToolName>Output for every tool listed in the available tools section above.

Get Started

Core Concepts

Guides

Build AI browser agents with Handstage

Installation

Key exports

Available tools

Wiring up with the Vercel AI SDK

Using the function form

Type-safe tool inputs and outputs

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Documentation Index

​Installation

​Key exports

​Available tools

​Wiring up with the Vercel AI SDK

​Using the function form

​Type-safe tool inputs and outputs

Build docs developers (and LLMs) love

Installation

Key exports

Available tools

Wiring up with the Vercel AI SDK

Using the function form

Type-safe tool inputs and outputs