Skip to main content
Agent-browser integrates seamlessly with AI coding assistants and agents to automate browser tasks.

Quick Start

The simplest approach is to just tell your agent to use it:
Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
The --help output is comprehensive and most agents can figure it out from there. Add the agent-browser skill to your AI coding assistant for richer context:
npx skills add vercel-labs/agent-browser
This works with:
  • Claude Code
  • Codex
  • Cursor
  • Gemini CLI
  • GitHub Copilot
  • Goose
  • OpenCode
  • Windsurf
The skill is fetched from the repository, so it stays up to date automatically. Do not copy SKILL.md from node_modules as it will become stale.

Claude Code Setup

1

Install the skill

In your project directory:
npx skills add vercel-labs/agent-browser
This adds the skill to .claude/skills/agent-browser/SKILL.md.
2

Use with Claude Code

The skill teaches Claude Code the full agent-browser workflow, including:
  • Snapshot-ref interaction pattern
  • Session management
  • Timeout handling
  • Security best practices
Just ask Claude Code to automate browser tasks:
Test the signup flow on https://example.com/signup

AGENTS.md / CLAUDE.md Instructions

For more consistent results, add agent-browser instructions to your project or global instructions file:
## Browser Automation

Use `agent-browser` for web automation. Run `agent-browser --help` for all commands.

Core workflow:
1. `agent-browser open <url>` - Navigate to page
2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2)
3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs
4. Re-snapshot after page changes

Core Workflow for Agents

Every browser automation follows this pattern:
1

Navigate

agent-browser open <url>
2

Get snapshot with refs

agent-browser snapshot -i
Output includes refs like @e1, @e2, @e3 for each interactive element.
3

Interact using refs

agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser click @e3
4

Re-snapshot after changes

agent-browser snapshot -i  # Get fresh refs after navigation

Essential Commands for AI Agents

agent-browser open <url>              # Navigate to URL
agent-browser close                   # Close browser

Snapshot with Refs

agent-browser snapshot -i             # Interactive elements with refs
agent-browser snapshot -i --json      # JSON output for parsing
agent-browser snapshot -i -C          # Include cursor-interactive (divs with onclick)

Interaction

agent-browser click @e1               # Click element by ref
agent-browser fill @e2 "text"         # Fill input by ref
agent-browser type @e3 "text"         # Type without clearing
agent-browser press Enter             # Press key
agent-browser select @e4 "option"     # Select dropdown option

Getting Information

agent-browser get text @e1            # Get element text
agent-browser get url                 # Get current URL
agent-browser get title               # Get page title

Waiting

agent-browser wait @e1                # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page"    # Wait for URL pattern
agent-browser wait 2000               # Wait milliseconds

Screenshots

agent-browser screenshot              # Screenshot to temp dir
agent-browser screenshot --annotate   # With numbered labels
agent-browser screenshot --full       # Full page

JSON Output Mode

Use --json for machine-readable output:
agent-browser snapshot --json
# Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{...}}}}

agent-browser get text @e1 --json
agent-browser is visible @e2 --json

Command Chaining

Commands can be chained with && for efficiency:
# Open, wait, and snapshot in one call
agent-browser open example.com && agent-browser wait --load networkidle && agent-browser snapshot -i

# Chain interactions
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "pass" && agent-browser click @e3
Use && when you don’t need intermediate output. Run separately when you need to parse output first.

Example: Form Automation

# Navigate and get snapshot
agent-browser open https://example.com/form
agent-browser snapshot -i

# Output:
# - textbox "Name" [ref=e1]
# - textbox "Email" [ref=e2]
# - combobox "Country" [ref=e3]
# - checkbox "Subscribe" [ref=e4]
# - button "Submit" [ref=e5]

# Fill form using refs
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "jane@example.com"
agent-browser select @e3 "United States"
agent-browser check @e4
agent-browser click @e5

# Wait for redirect and verify
agent-browser wait --url "**/success"
agent-browser snapshot -i

Example: Data Extraction

# Navigate and snapshot
agent-browser open https://example.com/products
agent-browser snapshot -i --json > snapshot.json

# Extract specific element text
agent-browser get text @e1 --json
agent-browser get text @e2 --json

# Clean up
agent-browser close

Example: Authenticated Workflow

# Save credentials once (encrypted)
echo "password" | agent-browser auth save myapp \
  --url https://app.example.com/login \
  --username user@example.com \
  --password-stdin

# Login using saved credentials
agent-browser auth login myapp

# Now authenticated - continue workflow
agent-browser open https://app.example.com/dashboard
agent-browser snapshot -i

Important: Ref Lifecycle

Refs are invalidated when the page changes. Always re-snapshot after:
  • Clicking links or buttons that navigate
  • Form submissions
  • Dynamic content loading
agent-browser click @e5              # Navigates to new page
agent-browser snapshot -i            # MUST re-snapshot
agent-browser click @e1              # Use new refs

Session Management

Use named sessions for parallel agent operations:
# Agent 1
agent-browser --session agent1 open site-a.com
agent-browser --session agent1 snapshot -i

# Agent 2 (parallel)
agent-browser --session agent2 open site-b.com
agent-browser --session agent2 snapshot -i

# Clean up
agent-browser --session agent1 close
agent-browser --session agent2 close

Timeouts and Slow Pages

For slow websites, use explicit waits:
# Wait for network to settle
agent-browser wait --load networkidle

# Wait for specific element
agent-browser wait @e1

# Wait for URL pattern
agent-browser wait --url "**/dashboard"

Error Handling

Agent-browser returns non-zero exit codes on errors. Check command success:
if agent-browser click @e1; then
  echo "Click successful"
else
  echo "Click failed"
fi

Security for AI Agents

Enable security features for production deployments:
# Content boundaries (helps LLMs distinguish page content)
export AGENT_BROWSER_CONTENT_BOUNDARIES=1

# Domain allowlist (restrict navigation)
export AGENT_BROWSER_ALLOWED_DOMAINS="example.com,*.example.com"

# Output limits (prevent context flooding)
export AGENT_BROWSER_MAX_OUTPUT=50000

# Action policy (gate destructive actions)
export AGENT_BROWSER_ACTION_POLICY=./policy.json

Best Practices

Always close sessions

Clean up browser sessions when done to avoid leaked processes:
agent-browser close

Use refs, not CSS selectors

Refs from snapshots are more reliable than CSS selectors for AI agents.

Re-snapshot after navigation

Always take a fresh snapshot after the page changes.

Use --json for parsing

JSON output is easier for agents to parse programmatically.

Chain commands when possible

Use && to chain commands that don’t need intermediate parsing.

Full Reference

For complete command reference and advanced features:

Build docs developers (and LLMs) love