Documentation Index
Fetch the complete documentation index at: https://mintlify.com/vercel-labs/agent-browser/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Streaming enables real-time browser viewport preview via WebSocket. This supports “pair browsing” where a human can watch and interact alongside an AI agent, or simply view the browser state as commands are executed.
Quick Start
# Enable streaming on port 9223
export AGENT_BROWSER_STREAM_PORT=9223
agent-browser open example.com
# Connect to ws://localhost:9223 to receive frames
The stream server starts automatically when AGENT_BROWSER_STREAM_PORT is set and remains active for the lifetime of the daemon.
WebSocket Protocol
Connect to ws://localhost:<port> to receive viewport frames and send input events.
Receive Frames
The server sends JPEG frames as JSON messages:
{
"type": "frame",
"data": "<base64-encoded-jpeg>",
"metadata": {
"deviceWidth": 1280,
"deviceHeight": 720,
"pageScaleFactor": 1,
"offsetTop": 0,
"scrollOffsetX": 0,
"scrollOffsetY": 0
}
}
Frames are broadcast to all connected clients. Decode the base64 data to display the image.
Send Mouse Events
{
"type": "input_mouse",
"eventType": "mousePressed",
"x": 100,
"y": 200,
"button": "left",
"clickCount": 1
}
Event types: mousePressed, mouseReleased, mouseMoved, mouseWheel
Buttons: left, right, middle, none
Send Keyboard Events
{
"type": "input_keyboard",
"eventType": "keyDown",
"key": "Enter",
"code": "Enter"
}
Event types: keyDown, keyUp, char
Send Touch Events
{
"type": "input_touch",
"eventType": "touchStart",
"touchPoints": [{ "x": 100, "y": 200 }]
}
Event types: touchStart, touchEnd, touchMove, touchCancel
Request Status
Response:
{
"type": "status",
"connected": true,
"screencasting": true,
"viewportWidth": 1280,
"viewportHeight": 720
}
Error Messages
{
"type": "error",
"message": "Browser not launched"
}
Implementation Details
Automatic Screencast
When the first client connects, the stream server automatically starts screencasting (see src/stream-server.ts:203-209):
if (this.clients.size === 1 && !this.isScreencasting) {
this.startScreencast().catch((error) => {
console.error('[StreamServer] Failed to start screencast:', error);
this.sendError(ws, error.message);
});
}
When the last client disconnects, screencasting stops automatically to conserve resources (see src/stream-server.ts:226-231).
Frame Quality
Default screencast options from src/stream-server.ts:364-370:
await this.browser.startScreencast((frame) => this.broadcastFrame(frame), {
format: 'jpeg',
quality: 80,
maxWidth: 1280,
maxHeight: 720,
everyNthFrame: 1,
});
- Format: JPEG (smaller size, faster transmission)
- Quality: 80 (good balance between quality and size)
- Max dimensions: 1280x720 (scales down larger viewports)
- Frame rate: Every frame (set
everyNthFrame: 2 to sample every other frame)
Input events are injected via CDP (Chrome DevTools Protocol). From src/stream-server.ts:244-275:
switch (message.type) {
case 'input_mouse':
await this.browser.injectMouseEvent({
type: message.eventType,
x: message.x,
y: message.y,
button: message.button,
clickCount: message.clickCount,
deltaX: message.deltaX,
deltaY: message.deltaY,
modifiers: message.modifiers,
});
break;
// ... keyboard and touch events
}
See src/browser.ts:2164-2228 for the CDP input injection implementation.
Security
Localhost Binding
The stream server binds to 127.0.0.1 only to prevent network exposure (from src/stream-server.ts:120-122):
this.wss = new WebSocketServer({
port: this.port,
host: '127.0.0.1', // Localhost only
// ...
});
Critical: The stream server allows direct input injection (mouse, keyboard, touch) which would be a critical security risk if exposed to the network.
Origin Validation
The stream server rejects cross-origin WebSocket connections from untrusted origins. From src/stream-server.ts:10-30:
export function isAllowedOrigin(origin: string | undefined): boolean {
// Allow connections with no origin (CLI tools)
if (!origin) {
return true;
}
// Allow file:// origins (local HTML files)
if (origin.startsWith('file://')) {
return true;
}
// Allow localhost/loopback origins (browser-based stream viewers)
const url = new URL(origin);
const host = url.hostname;
if (host === 'localhost' || host === '127.0.0.1' || host === '::1' || host === '[::1]') {
return true;
}
return false;
}
Allowed origins:
- No origin (CLI tools, WebSocket libraries)
file:// (local HTML viewers)
localhost, 127.0.0.1, ::1 (local web servers)
Rejected origins:
- All other origins (prevents malicious web pages from connecting)
Stream Port Discovery
The daemon writes the stream port to a file for clients to discover (from src/daemon.ts:365-366):
const streamPortFile = getStreamPortFile();
fs.writeFileSync(streamPortFile, streamPort.toString());
Clients can read ~/.agent-browser/default.stream (or <session>.stream for named sessions) to discover the port.
Programmatic API
For advanced use, control streaming directly via the BrowserManager API:
import { BrowserManager } from 'agent-browser';
const browser = new BrowserManager();
await browser.launch({ headless: true });
await browser.navigate('https://example.com');
// Start screencast
await browser.startScreencast((frame) => {
// frame.data is base64-encoded JPEG
console.log('Frame:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight);
}, {
format: 'jpeg',
quality: 80,
maxWidth: 1280,
maxHeight: 720,
});
// Inject mouse events
await browser.injectMouseEvent({
type: 'mousePressed',
x: 100,
y: 200,
button: 'left',
});
// Inject keyboard events
await browser.injectKeyboardEvent({
type: 'keyDown',
key: 'Enter',
code: 'Enter',
});
// Stop when done
await browser.stopScreencast();
See the TypeScript definitions in src/browser.ts:50-72 for full API details.
Example: Simple Viewer
Here’s a minimal HTML viewer for the stream:
<!DOCTYPE html>
<html>
<head>
<title>Browser Stream Viewer</title>
<style>
body { margin: 0; background: #000; }
#viewport { max-width: 100vw; max-height: 100vh; }
</style>
</head>
<body>
<img id="viewport" />
<script>
const ws = new WebSocket('ws://localhost:9223');
const img = document.getElementById('viewport');
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
if (message.type === 'frame') {
img.src = 'data:image/jpeg;base64,' + message.data;
}
};
// Send mouse clicks to the browser
img.onclick = (event) => {
const rect = img.getBoundingClientRect();
const scaleX = message.metadata.deviceWidth / rect.width;
const scaleY = message.metadata.deviceHeight / rect.height;
const x = (event.clientX - rect.left) * scaleX;
const y = (event.clientY - rect.top) * scaleY;
ws.send(JSON.stringify({
type: 'input_mouse',
eventType: 'mousePressed',
x: x,
y: y,
button: 'left',
clickCount: 1,
}));
ws.send(JSON.stringify({
type: 'input_mouse',
eventType: 'mouseReleased',
x: x,
y: y,
button: 'left',
}));
};
</script>
</body>
</html>
Open this HTML file in a browser to view and interact with the stream.
Use Cases
AI Agent Monitoring
Watch an AI agent in real-time as it navigates and interacts with pages:
# Start agent with streaming enabled
export AGENT_BROWSER_STREAM_PORT=9223
agent-browser open example.com
# AI agent runs commands
agent-browser snapshot
agent-browser click @e1
agent-browser fill @e2 "text"
# Human watches via stream viewer
open viewer.html
Pair Browsing
Human and AI collaborate - AI automates, human observes and corrects:
# AI agent
agent-browser --headed open app.com
agent-browser click @e1
# Human watches stream and can take over if needed
# (close AI agent, use headed mode manually)
Debugging
Debug automation issues by watching the actual browser state:
export AGENT_BROWSER_STREAM_PORT=9223
export AGENT_BROWSER_HEADED=0 # Headless mode
# Run automation
./my-automation-script.sh
# Watch stream to see what's happening
open viewer.html
Recording
Record browser sessions by capturing stream frames:
const ws = new WebSocket('ws://localhost:9223');
const frames = [];
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
if (message.type === 'frame') {
frames.push(message.data);
}
};
// Later: encode frames as video using ffmpeg or similar
Limitations
iOS Not Supported
Streaming requires CDP (Chrome DevTools Protocol), which is not available for iOS Safari automation. From src/ios-actions.ts:259:
return errorResponse(id, 'Screencast is not supported on iOS (requires CDP).');
Streaming adds overhead:
- Each frame is JPEG-encoded and base64-encoded
- Network transmission of potentially large frames
- Multiple clients multiply the bandwidth
For production automation, disable streaming unless you need live preview.
Frame Rate
The stream captures every rendered frame by default. For slower connections, reduce frame rate:
await browser.startScreencast((frame) => { ... }, {
everyNthFrame: 2, // Capture every 2nd frame (halves frame rate)
});
Troubleshooting
Connection Refused
WebSocket connection to 'ws://localhost:9223/' failed
Solutions:
- Verify
AGENT_BROWSER_STREAM_PORT is set before launching the daemon
- Check if the daemon is running:
pgrep -f agent-browser
- Verify the port is listening:
lsof -i :9223 (macOS/Linux) or netstat -ano | findstr :9223 (Windows)
No Frames Received
Possible causes:
- Browser not launched yet - send a command first:
agent-browser open example.com
- Screencast failed to start - check for errors in daemon logs
- Multiple clients - frames are broadcast, ensure at least one client is connected
Origin Blocked
[StreamServer] Rejected connection from origin: https://example.com
The stream server only accepts connections from localhost and file:// origins. Host your viewer on localhost or use a file:// URL.