Documentation Index Fetch the complete documentation index at: https://mintlify.com/vercel-labs/agent-browser/llms.txt
Use this file to discover all available pages before exploring further.
Agent-browser integrates seamlessly with AI coding assistants and agents to automate browser tasks.
Quick Start
The simplest approach is to just tell your agent to use it:
Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
The --help output is comprehensive and most agents can figure it out from there.
AI Coding Assistant Integration (Recommended)
Add the agent-browser skill to your AI coding assistant for richer context:
npx skills add vercel-labs/agent-browser
This works with:
Claude Code
Codex
Cursor
Gemini CLI
GitHub Copilot
Goose
OpenCode
Windsurf
The skill is fetched from the repository, so it stays up to date automatically. Do not copy SKILL.md from node_modules as it will become stale.
Claude Code Setup
Install the skill
In your project directory: npx skills add vercel-labs/agent-browser
This adds the skill to .claude/skills/agent-browser/SKILL.md.
Use with Claude Code
The skill teaches Claude Code the full agent-browser workflow, including:
Snapshot-ref interaction pattern
Session management
Timeout handling
Security best practices
Just ask Claude Code to automate browser tasks: Test the signup flow on https://example.com/signup
AGENTS.md / CLAUDE.md Instructions
For more consistent results, add agent-browser instructions to your project or global instructions file:
## Browser Automation
Use `agent-browser` for web automation. Run `agent-browser --help` for all commands.
Core workflow:
1. `agent-browser open <url>` - Navigate to page
2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2)
3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs
4. Re-snapshot after page changes
Core Workflow for Agents
Every browser automation follows this pattern:
Get snapshot with refs
agent-browser snapshot -i
Output includes refs like @e1, @e2, @e3 for each interactive element.
Interact using refs
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser click @e3
Re-snapshot after changes
agent-browser snapshot -i # Get fresh refs after navigation
Essential Commands for AI Agents
Navigation
agent-browser open < ur l > # Navigate to URL
agent-browser close # Close browser
Snapshot with Refs
agent-browser snapshot -i # Interactive elements with refs
agent-browser snapshot -i --json # JSON output for parsing
agent-browser snapshot -i -C # Include cursor-interactive (divs with onclick)
Interaction
agent-browser click @e1 # Click element by ref
agent-browser fill @e2 "text" # Fill input by ref
agent-browser type @e3 "text" # Type without clearing
agent-browser press Enter # Press key
agent-browser select @e4 "option" # Select dropdown option
agent-browser get text @e1 # Get element text
agent-browser get url # Get current URL
agent-browser get title # Get page title
Waiting
agent-browser wait @e1 # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page" # Wait for URL pattern
agent-browser wait 2000 # Wait milliseconds
Screenshots
agent-browser screenshot # Screenshot to temp dir
agent-browser screenshot --annotate # With numbered labels
agent-browser screenshot --full # Full page
JSON Output Mode
Use --json for machine-readable output:
agent-browser snapshot --json
# Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{...}}}}
agent-browser get text @e1 --json
agent-browser is visible @e2 --json
Command Chaining
Commands can be chained with && for efficiency:
# Open, wait, and snapshot in one call
agent-browser open example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
# Chain interactions
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "pass" && agent-browser click @e3
Use && when you don’t need intermediate output. Run separately when you need to parse output first.
# Navigate and get snapshot
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output:
# - textbox "Name" [ref=e1]
# - textbox "Email" [ref=e2]
# - combobox "Country" [ref=e3]
# - checkbox "Subscribe" [ref=e4]
# - button "Submit" [ref=e5]
# Fill form using refs
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "jane@example.com"
agent-browser select @e3 "United States"
agent-browser check @e4
agent-browser click @e5
# Wait for redirect and verify
agent-browser wait --url "**/success"
agent-browser snapshot -i
# Navigate and snapshot
agent-browser open https://example.com/products
agent-browser snapshot -i --json > snapshot.json
# Extract specific element text
agent-browser get text @e1 --json
agent-browser get text @e2 --json
# Clean up
agent-browser close
Example: Authenticated Workflow
# Save credentials once (encrypted)
echo "password" | agent-browser auth save myapp \
--url https://app.example.com/login \
--username user@example.com \
--password-stdin
# Login using saved credentials
agent-browser auth login myapp
# Now authenticated - continue workflow
agent-browser open https://app.example.com/dashboard
agent-browser snapshot -i
Important: Ref Lifecycle
Refs are invalidated when the page changes. Always re-snapshot after:
Clicking links or buttons that navigate
Form submissions
Dynamic content loading
agent-browser click @e5 # Navigates to new page
agent-browser snapshot -i # MUST re-snapshot
agent-browser click @e1 # Use new refs
Session Management
Use named sessions for parallel agent operations:
# Agent 1
agent-browser --session agent1 open site-a.com
agent-browser --session agent1 snapshot -i
# Agent 2 (parallel)
agent-browser --session agent2 open site-b.com
agent-browser --session agent2 snapshot -i
# Clean up
agent-browser --session agent1 close
agent-browser --session agent2 close
Timeouts and Slow Pages
For slow websites, use explicit waits:
# Wait for network to settle
agent-browser wait --load networkidle
# Wait for specific element
agent-browser wait @e1
# Wait for URL pattern
agent-browser wait --url "**/dashboard"
Error Handling
Agent-browser returns non-zero exit codes on errors. Check command success:
if agent-browser click @e1 ; then
echo "Click successful"
else
echo "Click failed"
fi
Security for AI Agents
Enable security features for production deployments:
# Content boundaries (helps LLMs distinguish page content)
export AGENT_BROWSER_CONTENT_BOUNDARIES = 1
# Domain allowlist (restrict navigation)
export AGENT_BROWSER_ALLOWED_DOMAINS = "example.com,*.example.com"
# Output limits (prevent context flooding)
export AGENT_BROWSER_MAX_OUTPUT = 50000
# Action policy (gate destructive actions)
export AGENT_BROWSER_ACTION_POLICY = ./ policy . json
Best Practices
Always close sessions Clean up browser sessions when done to avoid leaked processes:
Use refs, not CSS selectors Refs from snapshots are more reliable than CSS selectors for AI agents.
Re-snapshot after navigation Always take a fresh snapshot after the page changes.
Use --json for parsing JSON output is easier for agents to parse programmatically.
Chain commands when possible Use && to chain commands that don’t need intermediate parsing.
Full Reference
For complete command reference and advanced features: