agent()

Overview

The agent() method creates an autonomous agent that can perform multi-step browser automation tasks. Agents can navigate websites, interact with elements, extract data, and make decisions to complete complex workflows.

Method Signature

agent(config?: AgentConfig): AgentInstance

Parameters

config

AgentConfig

Optional configuration for the agent.

Show properties

model

string | AgentModelConfig

The model to use for agent reasoning. Defaults to the Stagehand instance’s model.

model: "anthropic/claude-sonnet-4-5-20250929"
// or
model: { modelName: "openai/gpt-4o" }

executionModel

string | AgentModelConfig

Model for tool execution (observe/act calls). If not specified, inherits from the main model.

executionModel: "google/gemini-2.0-flash"

mode

'dom' | 'hybrid' | 'cua'

default:"dom"

Tool mode determining available agent capabilities:

dom - DOM-based tools (act, fillForm) for structured interactions
hybrid - Coordinate-based tools (click, type, dragAndDrop) for visual interactions
cua - Computer Use Agent providers (Anthropic, OpenAI, Google) for screenshot-based automation

stream

boolean

default:false

Enable streaming mode. When true, execute() returns AgentStreamResult for incremental output.

systemPrompt

string

Custom system prompt to override the default agent instructions.

integrations

(Client | string)[]

MCP (Model Context Protocol) integrations for extended capabilities.

integrations: [mcpClient, "mcp://another-server"]

Agent Instance Methods

execute()

Executes the agent with a given instruction.

execute(
  instructionOrOptions: string | AgentExecuteOptions
): Promise<AgentResult>

instructionOrOptions

string | AgentExecuteOptions

required

The task instruction (string) or full options object.

Show AgentExecuteOptions properties

instruction

string

required

Natural language instruction describing the task to complete.

maxSteps

number

default:20

Maximum number of steps the agent can take before stopping.

page

Page

Specific page for the agent to operate on.

highlightCursor

boolean

Show cursor movements (defaults to true in hybrid mode).

messages

ModelMessage[]

Previous conversation messages to continue from a prior execution.

// Continue conversation
const result1 = await agent.execute("First task");
const result2 = await agent.execute({
  instruction: "Now do this",
  messages: result1.messages,
});

signal

AbortSignal

Abort signal to cancel agent execution.

const controller = new AbortController();
setTimeout(() => controller.abort(), 30000); // 30s timeout

await agent.execute({
  instruction: "...",
  signal: controller.signal,
});

excludeTools

string[]

Tools to exclude from this execution. See available tools by mode below.

excludeTools: ["screenshot", "extract"]

output

StagehandZodObject

Zod schema defining custom output data to return when task completes.

import { z } from "zod";

output: z.object({
  price: z.string().describe("Product price"),
  name: z.string().describe("Product name"),
})

variables

Variables

Variables the agent can use when filling forms or typing.

variables: {
  username: { 
    value: "user@example.com", 
    description: "Login email" 
  },
  password: "secret123",
}

callbacks

AgentExecuteCallbacks

Callbacks for monitoring agent execution.

Show callback properties

prepareStep

PrepareStepFunction

Called before each step to modify settings.

onStepFinish

OnStepFinishCallback

Called when each LLM step finishes.

onSafetyConfirmation

SafetyConfirmationHandler

Handle safety checks (CUA mode only).

Return Value

Returns a Promise<AgentResult>:

interface AgentResult {
  success: boolean;      // Whether the task completed successfully
  message: string;       // Agent's final message
  actions: AgentAction[]; // Actions taken by the agent
  completed: boolean;    // Whether agent called the done tool
  messages?: ModelMessage[]; // Conversation messages (for continuation)
  output?: Record<string, unknown>; // Custom output data (if schema provided)
  usage?: {              // Token usage statistics
    input_tokens: number;
    output_tokens: number;
    reasoning_tokens?: number;
    cached_input_tokens?: number;
    inference_time_ms: number;
  };
}

Usage Examples

Basic Agent Task

import { Stagehand } from "@stagehand/api";

const stagehand = new Stagehand({
  env: "BROWSERBASE",
  apiKey: process.env.BROWSERBASE_API_KEY,
});

await stagehand.init();
const page = stagehand.context.pages()[0];

await page.goto("https://news.ycombinator.com");

// Create and execute agent
const agent = stagehand.agent();

const result = await agent.execute(
  "Find the top story and click on it"
);

if (result.success) {
  console.log("Task completed:", result.message);
  console.log("Actions taken:", result.actions.length);
} else {
  console.error("Task failed:", result.message);
}

With Custom Model

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
  executionModel: "google/gemini-2.0-flash", // Fast model for tool execution
});

const result = await agent.execute({
  instruction: "Search for 'web scraping' and extract the first 5 results",
  maxSteps: 15,
});

Streaming Mode

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
  stream: true, // Enable streaming
});

const agentRun = await agent.execute(
  "Go to Amazon and search for 'laptop'"
);

// Stream text output
for await (const delta of agentRun.textStream) {
  process.stdout.write(delta);
}

// Wait for final result
const result = await agentRun.result;
console.log("\nFinal result:", result);

With Custom Output Schema

import { z } from "zod";

const agent = stagehand.agent();

const result = await agent.execute({
  instruction: "Find the cheapest laptop on this page",
  output: z.object({
    name: z.string().describe("Product name"),
    price: z.string().describe("Product price"),
    rating: z.number().describe("Product rating out of 5"),
  }),
});

if (result.output) {
  console.log(`Found: ${result.output.name}`);
  console.log(`Price: ${result.output.price}`);
  console.log(`Rating: ${result.output.rating}/5`);
}

Conversation Continuation

const agent = stagehand.agent();

// First task
const result1 = await agent.execute(
  "Go to GitHub and search for 'stagehand'"
);

// Continue the conversation
const result2 = await agent.execute({
  instruction: "Now click on the first repository",
  messages: result1.messages, // Continue from previous state
});

// Another continuation
const result3 = await agent.execute({
  instruction: "Read the README and summarize it",
  messages: result2.messages,
});

With Variables

const agent = stagehand.agent();

await page.goto("https://example.com/login");

const result = await agent.execute({
  instruction: "Log in using the provided credentials",
  variables: {
    username: {
      value: process.env.USERNAME,
      description: "User's email address",
    },
    password: {
      value: process.env.PASSWORD,
      description: "User's password",
    },
  },
});

With Tool Exclusions

const agent = stagehand.agent();

const result = await agent.execute({
  instruction: "Navigate to the product page and click buy",
  excludeTools: ["screenshot", "extract"], // Faster execution
});

With Abort Signal

const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 60000); // 1 minute

try {
  const result = await agent.execute({
    instruction: "Complete the checkout process",
    signal: controller.signal,
  });
  clearTimeout(timeoutId);
} catch (error) {
  if (error instanceof AgentAbortError) {
    console.log("Agent was aborted");
  }
}

Hybrid Mode (Coordinate-Based)

const agent = stagehand.agent({
  mode: "hybrid", // Use coordinate-based tools
  model: "google/gemini-2.0-flash",
});

await page.goto("https://example.com");

const result = await agent.execute({
  instruction: "Click on the blue button in the top right",
  highlightCursor: true, // Show cursor movements
});

CUA Mode (Computer Use Agent)

const agent = stagehand.agent({
  mode: "cua",
  model: "anthropic/claude-sonnet-4-5-20250929",
});

const result = await agent.execute(
  "Navigate to the settings page and enable dark mode"
);

With Callbacks

const agent = stagehand.agent();

const result = await agent.execute({
  instruction: "Search for products and add to cart",
  callbacks: {
    onStepFinish: async (step) => {
      console.log("Step completed:", step.finishReason);
      if (step.toolCalls) {
        step.toolCalls.forEach((call) => {
          console.log(`Tool: ${call.toolName}`);
        });
      }
    },
  },
});

Agent Modes

DOM Mode (Default)

Best for structured page interactions. Available tools:

act - Semantic actions (click, type)
fillForm - Fill form fields
ariaTree - Get accessibility tree
extract - Extract data
goto - Navigate to URL
scroll - Scroll with semantic directions
keys - Press keyboard keys
navback - Navigate back
screenshot - Take screenshot
think - Agent reasoning
wait - Wait for time/condition
done - Mark task complete
search - Web search (requires BRAVE_API_KEY)

Hybrid Mode

Best for visual/screenshot-based interactions. Available tools:

click - Click at coordinates
type - Type at coordinates
dragAndDrop - Drag between points
clickAndHold - Click and hold
fillFormVision - Fill forms using vision
Plus all DOM mode tools

CUA Mode

Uses provider’s native computer use capabilities. Supported models:

openai/computer-use-preview
anthropic/claude-sonnet-4-5-20250929
google/gemini-2.5-computer-use-preview-10-2025
And more - see documentation

Best Practices

Clear instructions - Be specific about the goal

// Good
await agent.execute(
  "Find the product with the lowest price and add it to cart"
);

// Too vague
await agent.execute("buy something");

Set appropriate maxSteps - Prevent runaway executions

await agent.execute({
  instruction: "...",
  maxSteps: 10, // Simple task
});

Use output schemas - Get structured data

await agent.execute({
  instruction: "...",
  output: z.object({ ... }),
});

Handle errors gracefully

const result = await agent.execute(instruction);

if (!result.success) {
  console.error("Failed:", result.message);
  // Retry or handle error
}

Use variables for sensitive data

await agent.execute({
  instruction: "Log in with credentials",
  variables: { 
    username: process.env.USER,
    password: process.env.PASS 
  },
});

Monitor with callbacks

await agent.execute({
  instruction: "...",
  callbacks: {
    onStepFinish: (step) => logStep(step),
  },
});

Error Handling

try {
  const result = await agent.execute(instruction);
  
  if (!result.success) {
    console.error("Agent failed:", result.message);
  }
} catch (error) {
  if (error instanceof AgentAbortError) {
    console.log("Agent was aborted");
  } else if (error instanceof StreamingCallbacksInNonStreamingModeError) {
    console.error("Invalid callback usage");
  } else {
    console.error("Unexpected error:", error);
  }
}

Performance Tips

Use faster models for execution

agent({
  model: "anthropic/claude-sonnet-4-5-20250929", // Reasoning
  executionModel: "google/gemini-2.0-flash", // Fast tools
})

Exclude unnecessary tools

execute({
  instruction: "...",
  excludeTools: ["screenshot", "extract"],
})

Set reasonable maxSteps

execute({ instruction: "...", maxSteps: 10 })

Use conversation continuation - Reuse context

const result1 = await agent.execute("First task");
const result2 = await agent.execute({
  instruction: "Next task",
  messages: result1.messages,
});

act() - Single-step actions
extract() - Data extraction
observe() - Preview actions

Getting Started

Core Concepts

Core Methods

Configuration

Integrations

Best Practices

Advanced Features

Overview

Method Signature

Parameters

Agent Instance Methods

execute()

Return Value

Usage Examples

Basic Agent Task

With Custom Model

Streaming Mode

With Custom Output Schema

Conversation Continuation

With Variables

With Tool Exclusions

With Abort Signal

Hybrid Mode (Coordinate-Based)

CUA Mode (Computer Use Agent)

With Callbacks

Agent Modes

DOM Mode (Default)

Hybrid Mode

CUA Mode

Best Practices

Error Handling

Performance Tips

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Core Methods

Configuration

Integrations

Best Practices

Advanced Features

Documentation Index

​Overview

​Method Signature

​Parameters

​Agent Instance Methods

​execute()

​Return Value

​Usage Examples

​Basic Agent Task

​With Custom Model

​Streaming Mode

​With Custom Output Schema

​Conversation Continuation

​With Variables

​With Tool Exclusions

​With Abort Signal

​Hybrid Mode (Coordinate-Based)

​CUA Mode (Computer Use Agent)

​With Callbacks

​Agent Modes

​DOM Mode (Default)

​Hybrid Mode

​CUA Mode

​Best Practices

​Error Handling

​Performance Tips

​Related Methods

Build docs developers (and LLMs) love

Overview

Method Signature

Parameters

Agent Instance Methods

execute()

Return Value

Usage Examples

Basic Agent Task

With Custom Model

Streaming Mode

With Custom Output Schema

Conversation Continuation

With Variables

With Tool Exclusions

With Abort Signal

Hybrid Mode (Coordinate-Based)

CUA Mode (Computer Use Agent)

With Callbacks

Agent Modes

DOM Mode (Default)

Hybrid Mode

CUA Mode

Best Practices

Error Handling

Performance Tips

Related Methods