Documentation Index
Fetch the complete documentation index at: https://mintlify.com/browserbase/stagehand/llms.txt
Use this file to discover all available pages before exploring further.
Stagehand provides native support for Computer Use APIs from major AI providers. These APIs enable AI agents to interact with web browsers using visual understanding and coordinate-based actions.
Overview
Computer Use APIs allow AI models to:
- See screenshots of web pages
- Click at specific coordinates
- Type text into fields
- Scroll, drag, and perform other mouse/keyboard actions
- Navigate between pages
Stagehand supports three CUA implementations:
- Anthropic - Claude’s Computer Use API
- Google - Gemini’s Computer Use API
- OpenAI - GPT’s Computer Use API (preview)
Creating a CUA Agent
import { Stagehand } from "@browserbasehq/stagehand";
const stagehand = new Stagehand({
env: "LOCAL",
verbose: 2,
});
await stagehand.init();
const page = stagehand.context.pages()[0];
// Create a Computer Use Agent
const agent = stagehand.agent({
mode: "cua",
model: {
modelName: "google/gemini-3-flash-preview",
apiKey: process.env.GEMINI_API_KEY,
},
systemPrompt: `You are a helpful assistant that can use a web browser.
You are currently on: ${page.url()}.
Today's date is ${new Date().toLocaleDateString()}.`,
});
// Execute a task
await page.goto("https://www.example.com");
const result = await agent.execute({
instruction: "Fill out the contact form with test data",
maxSteps: 20,
});
Provider-Specific Implementations
Anthropic CUA Client
Location: packages/core/lib/v3/agent/AnthropicCUAClient.ts
Key Features:
- Uses Anthropic’s Messages API with
computer_20251124 tool
- Supports Claude 4.5+ models with extended thinking budgets
- Handles image compression in conversation history
- Converts between Anthropic’s coordinate system and Playwright actions
Configuration:
const agent = stagehand.agent({
mode: "cua",
model: {
modelName: "anthropic/claude-sonnet-4-5-20250929",
apiKey: process.env.ANTHROPIC_API_KEY,
thinkingBudget: 5000, // Optional: extended thinking tokens
},
});
Supported Actions:
screenshot - Capture current page state
click - Click at x,y coordinates
type - Type text
keypress - Press keyboard keys
scroll - Scroll in a direction
move - Move mouse cursor
drag - Drag between coordinates
doubleClick - Double-click at coordinates
Action Conversion:
The client converts Anthropic’s tool calls to Playwright actions:
// Anthropic returns:
{
"name": "computer",
"input": {
"action": "left_click",
"coordinate": [500, 300]
}
}
// Converted to:
{
type: "click",
x: 500,
y: 300,
button: "left"
}
Google CUA Client
Location: packages/core/lib/v3/agent/GoogleCUAClient.ts
Key Features:
- Uses Google’s
computerUse tool with Gemini models
- Normalizes coordinates from 0-1000 range to viewport dimensions
- Supports both browser and desktop environments
- Handles safety confirmations for sensitive actions
Configuration:
const agent = stagehand.agent({
mode: "cua",
model: {
modelName: "google/gemini-2-5-flash-preview",
apiKey: process.env.GEMINI_API_KEY,
environment: "ENVIRONMENT_BROWSER", // or "ENVIRONMENT_DESKTOP"
},
});
Supported Function Calls:
open_web_browser - Open browser
click_at - Click at coordinates
type_text_at - Click and type at location
key_combination - Press key combinations
scroll_document - Scroll page up/down
scroll_at - Scroll at specific location
navigate - Go to URL
go_back / go_forward - Browser navigation
hover_at - Hover at coordinates
drag_and_drop - Drag between points
wait_5_seconds - Wait for page updates
Coordinate Normalization:
private normalizeCoordinates(x: number, y: number) {
// Google uses 0-1000 range, convert to actual viewport pixels
const clampedX = Math.min(999, Math.max(0, x));
const clampedY = Math.min(999, Math.max(0, y));
return {
x: Math.floor((clampedX / 1000) * this.currentViewport.width),
y: Math.floor((clampedY / 1000) * this.currentViewport.height)
};
}
Safety Confirmations:
Google CUA may request safety confirmations for sensitive actions:
const agent = stagehand.agent({
mode: "cua",
model: { /* ... */ },
safetyConfirmationHandler: async (safetyChecks) => {
console.log("Safety checks:", safetyChecks);
return { acknowledged: true };
},
});
OpenAI CUA Client
Location: packages/core/lib/v3/agent/OpenAICUAClient.ts
Key Features:
- Uses OpenAI’s Responses API for computer use (preview)
- Tracks reasoning items across conversation
- Supports function calls alongside computer actions
- Maintains response history with
previous_response_id
Configuration:
const agent = stagehand.agent({
mode: "cua",
model: {
modelName: "openai/gpt-4o",
apiKey: process.env.OPENAI_API_KEY,
environment: "browser", // "browser", "mac", "windows", or "ubuntu"
},
});
Response Types:
computer_call - Computer action request
function_call - Custom tool invocation
reasoning - Model’s internal reasoning
message - Text response to user
Computer Call Flow:
// 1. Model returns computer_call
{
type: "computer_call",
call_id: "call_123",
action: {
type: "click",
x: 100,
y: 200
}
}
// 2. Execute action and capture screenshot
// 3. Return computer_call_output
{
type: "computer_call_output",
call_id: "call_123",
output: {
type: "input_image",
image_url: "data:image/png;base64,...",
current_url: "https://example.com"
}
}
Browser Configuration
IMPORTANT: Computer Use requires specific browser dimensions. Configure in stagehand.config.ts:
export default {
browserOptions: {
headless: false,
defaultViewport: {
width: 1288,
height: 711,
},
},
};
Or set at runtime:
const stagehand = new Stagehand({
env: "LOCAL",
browserOptions: {
defaultViewport: { width: 1288, height: 711 },
},
});
Action Handlers
CUA clients use action handlers to execute browser actions:
// Set in AgentContext (packages/core/lib/v3/agent/AgentContext.ts)
this.cuaClient.setActionHandler(async (action: AgentAction) => {
switch (action.type) {
case "click":
await page.mouse.click(action.x, action.y);
break;
case "type":
await page.keyboard.type(action.text);
break;
case "scroll":
await page.mouse.wheel(action.scroll_x, action.scroll_y);
break;
// ... other actions
}
});
Screenshot Providers
All CUA clients require a screenshot provider:
this.cuaClient.setScreenshotProvider(async () => {
const page = await this.v3.context.awaitActivePage();
const screenshot = await page.screenshot();
return screenshot.toString("base64");
});
Image Compression
To reduce token usage, Stagehand compresses images in conversation history:
// Anthropic: compressConversationImages()
// Keeps first 2 images, compresses remaining to 25% quality
// Google: compressGoogleConversationImages()
// Similar compression strategy for Google's format
Custom Tools with CUA
You can combine Computer Use with custom tools:
import { tool } from "ai";
import { z } from "zod";
const getWeather = tool({
description: "Get weather for a location",
inputSchema: z.object({
location: z.string(),
}),
execute: async ({ location }) => {
// Your API call here
return { temp: 70, conditions: "sunny" };
},
});
const agent = stagehand.agent({
mode: "cua",
model: { /* ... */ },
tools: { getWeather },
});
See agent-custom-tools.ts for a complete example.
Best Practices
- Set appropriate maxSteps: CUA tasks typically need 10-20 steps
- Use specific system prompts: Include context about the current page and date
- Handle errors gracefully: CUA actions can fail; implement retry logic
- Monitor token usage: Screenshots consume many tokens; use compression
- Test viewport dimensions: Ensure coordinates map correctly to your viewport
Example: Complete CUA Workflow
import { Stagehand } from "@browserbasehq/stagehand";
import chalk from "chalk";
const stagehand = new Stagehand({
env: "LOCAL",
verbose: 2,
browserOptions: {
defaultViewport: { width: 1288, height: 711 },
},
});
await stagehand.init();
const page = stagehand.context.pages()[0];
const agent = stagehand.agent({
mode: "cua",
model: {
modelName: "anthropic/claude-sonnet-4-5",
apiKey: process.env.ANTHROPIC_API_KEY,
},
systemPrompt: `You are a helpful assistant.
Current page: ${page.url()}
Date: ${new Date().toLocaleDateString()}`,
});
await page.goto("https://www.browserbase.com/careers");
const result = await agent.execute({
instruction: "Apply for the first engineer position with test data. Don't submit.",
maxSteps: 20,
});
console.log(chalk.green("✓"), "Complete:", result.message);
console.log("Actions performed:", result.actions.length);
console.log("Token usage:", result.usage);
await stagehand.close();
References
- Anthropic CUA:
packages/core/lib/v3/agent/AnthropicCUAClient.ts
- Google CUA:
packages/core/lib/v3/agent/GoogleCUAClient.ts
- OpenAI CUA:
packages/core/lib/v3/agent/OpenAICUAClient.ts
- Example:
packages/core/examples/cua-example.ts
- Custom Tools Example:
packages/core/examples/agent-custom-tools.ts