Documentation Index Fetch the complete documentation index at: https://mintlify.com/browserbase/stagehand/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The agent() method creates an autonomous agent that can perform multi-step browser automation tasks. Agents can navigate websites, interact with elements, extract data, and make decisions to complete complex workflows.
Method Signature
agent ( config ?: AgentConfig ): AgentInstance
Parameters
Optional configuration for the agent. model
string | AgentModelConfig
The model to use for agent reasoning. Defaults to the Stagehand instance’s model. model : "anthropic/claude-sonnet-4-5-20250929"
// or
model : { modelName : "openai/gpt-4o" }
executionModel
string | AgentModelConfig
Model for tool execution (observe/act calls). If not specified, inherits from the main model. executionModel : "google/gemini-2.0-flash"
mode
'dom' | 'hybrid' | 'cua'
default: "dom"
Tool mode determining available agent capabilities:
dom - DOM-based tools (act, fillForm) for structured interactions
hybrid - Coordinate-based tools (click, type, dragAndDrop) for visual interactions
cua - Computer Use Agent providers (Anthropic, OpenAI, Google) for screenshot-based automation
Enable streaming mode. When true, execute() returns AgentStreamResult for incremental output.
Custom system prompt to override the default agent instructions.
MCP (Model Context Protocol) integrations for extended capabilities. integrations : [ mcpClient , "mcp://another-server" ]
Agent Instance Methods
execute()
Executes the agent with a given instruction.
execute (
instructionOrOptions : string | AgentExecuteOptions
): Promise < AgentResult >
instructionOrOptions
string | AgentExecuteOptions
required
The task instruction (string) or full options object. Show AgentExecuteOptions properties
Natural language instruction describing the task to complete.
Maximum number of steps the agent can take before stopping.
Specific page for the agent to operate on.
Show cursor movements (defaults to true in hybrid mode).
Previous conversation messages to continue from a prior execution. // Continue conversation
const result1 = await agent . execute ( "First task" );
const result2 = await agent . execute ({
instruction: "Now do this" ,
messages: result1 . messages ,
});
Abort signal to cancel agent execution. const controller = new AbortController ();
setTimeout (() => controller . abort (), 30000 ); // 30s timeout
await agent . execute ({
instruction: "..." ,
signal: controller . signal ,
});
Tools to exclude from this execution. See available tools by mode below. excludeTools : [ "screenshot" , "extract" ]
Zod schema defining custom output data to return when task completes. import { z } from "zod" ;
output : z . object ({
price: z . string (). describe ( "Product price" ),
name: z . string (). describe ( "Product name" ),
})
Variables the agent can use when filling forms or typing. variables : {
username : {
value : "user@example.com" ,
description : "Login email"
},
password : "secret123" ,
}
Callbacks for monitoring agent execution. Called before each step to modify settings.
Called when each LLM step finishes.
onSafetyConfirmation
SafetyConfirmationHandler
Handle safety checks (CUA mode only).
Return Value
Returns a Promise<AgentResult>:
interface AgentResult {
success : boolean ; // Whether the task completed successfully
message : string ; // Agent's final message
actions : AgentAction []; // Actions taken by the agent
completed : boolean ; // Whether agent called the done tool
messages ?: ModelMessage []; // Conversation messages (for continuation)
output ?: Record < string , unknown >; // Custom output data (if schema provided)
usage ?: { // Token usage statistics
input_tokens : number ;
output_tokens : number ;
reasoning_tokens ?: number ;
cached_input_tokens ?: number ;
inference_time_ms : number ;
};
}
Usage Examples
Basic Agent Task
import { Stagehand } from "@stagehand/api" ;
const stagehand = new Stagehand ({
env: "BROWSERBASE" ,
apiKey: process . env . BROWSERBASE_API_KEY ,
});
await stagehand . init ();
const page = stagehand . context . pages ()[ 0 ];
await page . goto ( "https://news.ycombinator.com" );
// Create and execute agent
const agent = stagehand . agent ();
const result = await agent . execute (
"Find the top story and click on it"
);
if ( result . success ) {
console . log ( "Task completed:" , result . message );
console . log ( "Actions taken:" , result . actions . length );
} else {
console . error ( "Task failed:" , result . message );
}
With Custom Model
const agent = stagehand . agent ({
model: "anthropic/claude-sonnet-4-5-20250929" ,
executionModel: "google/gemini-2.0-flash" , // Fast model for tool execution
});
const result = await agent . execute ({
instruction: "Search for 'web scraping' and extract the first 5 results" ,
maxSteps: 15 ,
});
Streaming Mode
const agent = stagehand . agent ({
model: "anthropic/claude-sonnet-4-5-20250929" ,
stream: true , // Enable streaming
});
const agentRun = await agent . execute (
"Go to Amazon and search for 'laptop'"
);
// Stream text output
for await ( const delta of agentRun . textStream ) {
process . stdout . write ( delta );
}
// Wait for final result
const result = await agentRun . result ;
console . log ( " \n Final result:" , result );
With Custom Output Schema
import { z } from "zod" ;
const agent = stagehand . agent ();
const result = await agent . execute ({
instruction: "Find the cheapest laptop on this page" ,
output: z . object ({
name: z . string (). describe ( "Product name" ),
price: z . string (). describe ( "Product price" ),
rating: z . number (). describe ( "Product rating out of 5" ),
}),
});
if ( result . output ) {
console . log ( `Found: ${ result . output . name } ` );
console . log ( `Price: ${ result . output . price } ` );
console . log ( `Rating: ${ result . output . rating } /5` );
}
Conversation Continuation
const agent = stagehand . agent ();
// First task
const result1 = await agent . execute (
"Go to GitHub and search for 'stagehand'"
);
// Continue the conversation
const result2 = await agent . execute ({
instruction: "Now click on the first repository" ,
messages: result1 . messages , // Continue from previous state
});
// Another continuation
const result3 = await agent . execute ({
instruction: "Read the README and summarize it" ,
messages: result2 . messages ,
});
With Variables
const agent = stagehand . agent ();
await page . goto ( "https://example.com/login" );
const result = await agent . execute ({
instruction: "Log in using the provided credentials" ,
variables: {
username: {
value: process . env . USERNAME ,
description: "User's email address" ,
},
password: {
value: process . env . PASSWORD ,
description: "User's password" ,
},
},
});
With Tool Exclusions
const agent = stagehand . agent ();
const result = await agent . execute ({
instruction: "Navigate to the product page and click buy" ,
excludeTools: [ "screenshot" , "extract" ], // Faster execution
});
With Abort Signal
const controller = new AbortController ();
const timeoutId = setTimeout (() => controller . abort (), 60000 ); // 1 minute
try {
const result = await agent . execute ({
instruction: "Complete the checkout process" ,
signal: controller . signal ,
});
clearTimeout ( timeoutId );
} catch ( error ) {
if ( error instanceof AgentAbortError ) {
console . log ( "Agent was aborted" );
}
}
Hybrid Mode (Coordinate-Based)
const agent = stagehand . agent ({
mode: "hybrid" , // Use coordinate-based tools
model: "google/gemini-2.0-flash" ,
});
await page . goto ( "https://example.com" );
const result = await agent . execute ({
instruction: "Click on the blue button in the top right" ,
highlightCursor: true , // Show cursor movements
});
CUA Mode (Computer Use Agent)
const agent = stagehand . agent ({
mode: "cua" ,
model: "anthropic/claude-sonnet-4-5-20250929" ,
});
const result = await agent . execute (
"Navigate to the settings page and enable dark mode"
);
With Callbacks
const agent = stagehand . agent ();
const result = await agent . execute ({
instruction: "Search for products and add to cart" ,
callbacks: {
onStepFinish : async ( step ) => {
console . log ( "Step completed:" , step . finishReason );
if ( step . toolCalls ) {
step . toolCalls . forEach (( call ) => {
console . log ( `Tool: ${ call . toolName } ` );
});
}
},
},
});
Agent Modes
DOM Mode (Default)
Best for structured page interactions.
Available tools:
act - Semantic actions (click, type)
fillForm - Fill form fields
ariaTree - Get accessibility tree
extract - Extract data
goto - Navigate to URL
scroll - Scroll with semantic directions
keys - Press keyboard keys
navback - Navigate back
screenshot - Take screenshot
think - Agent reasoning
wait - Wait for time/condition
done - Mark task complete
search - Web search (requires BRAVE_API_KEY)
Hybrid Mode
Best for visual/screenshot-based interactions.
Available tools:
click - Click at coordinates
type - Type at coordinates
dragAndDrop - Drag between points
clickAndHold - Click and hold
fillFormVision - Fill forms using vision
Plus all DOM mode tools
CUA Mode
Uses provider’s native computer use capabilities.
Supported models:
openai/computer-use-preview
anthropic/claude-sonnet-4-5-20250929
google/gemini-2.5-computer-use-preview-10-2025
And more - see documentation
Best Practices
Clear instructions - Be specific about the goal
// Good
await agent . execute (
"Find the product with the lowest price and add it to cart"
);
// Too vague
await agent . execute ( "buy something" );
Set appropriate maxSteps - Prevent runaway executions
await agent . execute ({
instruction: "..." ,
maxSteps: 10 , // Simple task
});
Use output schemas - Get structured data
await agent . execute ({
instruction: "..." ,
output: z . object ({ ... }),
});
Handle errors gracefully
const result = await agent . execute ( instruction );
if ( ! result . success ) {
console . error ( "Failed:" , result . message );
// Retry or handle error
}
Use variables for sensitive data
await agent . execute ({
instruction: "Log in with credentials" ,
variables: {
username: process . env . USER ,
password: process . env . PASS
},
});
Monitor with callbacks
await agent . execute ({
instruction: "..." ,
callbacks: {
onStepFinish : ( step ) => logStep ( step ),
},
});
Error Handling
try {
const result = await agent . execute ( instruction );
if ( ! result . success ) {
console . error ( "Agent failed:" , result . message );
}
} catch ( error ) {
if ( error instanceof AgentAbortError ) {
console . log ( "Agent was aborted" );
} else if ( error instanceof StreamingCallbacksInNonStreamingModeError ) {
console . error ( "Invalid callback usage" );
} else {
console . error ( "Unexpected error:" , error );
}
}
Performance Tips
Use faster models for execution
agent ({
model: "anthropic/claude-sonnet-4-5-20250929" , // Reasoning
executionModel: "google/gemini-2.0-flash" , // Fast tools
})
Exclude unnecessary tools
execute ({
instruction: "..." ,
excludeTools: [ "screenshot" , "extract" ],
})
Set reasonable maxSteps
execute ({ instruction: "..." , maxSteps: 10 })
Use conversation continuation - Reuse context
const result1 = await agent . execute ( "First task" );
const result2 = await agent . execute ({
instruction: "Next task" ,
messages: result1 . messages ,
});
Related Methods