Overview
Streaming enables real-time browser viewport preview via WebSocket. This supports “pair browsing” where a human can watch and interact alongside an AI agent, or simply view the browser state as commands are executed.Quick Start
AGENT_BROWSER_STREAM_PORT is set and remains active for the lifetime of the daemon.
WebSocket Protocol
Connect tows://localhost:<port> to receive viewport frames and send input events.
Receive Frames
The server sends JPEG frames as JSON messages:Send Mouse Events
mousePressed, mouseReleased, mouseMoved, mouseWheel
Buttons: left, right, middle, none
Send Keyboard Events
keyDown, keyUp, char
Send Touch Events
touchStart, touchEnd, touchMove, touchCancel
Request Status
Error Messages
Implementation Details
Automatic Screencast
When the first client connects, the stream server automatically starts screencasting (seesrc/stream-server.ts:203-209):
src/stream-server.ts:226-231).
Frame Quality
Default screencast options fromsrc/stream-server.ts:364-370:
- Format: JPEG (smaller size, faster transmission)
- Quality: 80 (good balance between quality and size)
- Max dimensions: 1280x720 (scales down larger viewports)
- Frame rate: Every frame (set
everyNthFrame: 2to sample every other frame)
Input Injection
Input events are injected via CDP (Chrome DevTools Protocol). Fromsrc/stream-server.ts:244-275:
src/browser.ts:2164-2228 for the CDP input injection implementation.
Security
Localhost Binding
The stream server binds to127.0.0.1 only to prevent network exposure (from src/stream-server.ts:120-122):
Origin Validation
The stream server rejects cross-origin WebSocket connections from untrusted origins. Fromsrc/stream-server.ts:10-30:
- No origin (CLI tools, WebSocket libraries)
file://(local HTML viewers)localhost,127.0.0.1,::1(local web servers)
- All other origins (prevents malicious web pages from connecting)
Stream Port Discovery
The daemon writes the stream port to a file for clients to discover (fromsrc/daemon.ts:365-366):
~/.agent-browser/default.stream (or <session>.stream for named sessions) to discover the port.
Programmatic API
For advanced use, control streaming directly via the BrowserManager API:src/browser.ts:50-72 for full API details.
Example: Simple Viewer
Here’s a minimal HTML viewer for the stream:Use Cases
AI Agent Monitoring
Watch an AI agent in real-time as it navigates and interacts with pages:Pair Browsing
Human and AI collaborate - AI automates, human observes and corrects:Debugging
Debug automation issues by watching the actual browser state:Recording
Record browser sessions by capturing stream frames:Limitations
iOS Not Supported
Streaming requires CDP (Chrome DevTools Protocol), which is not available for iOS Safari automation. Fromsrc/ios-actions.ts:259:
Performance
Streaming adds overhead:- Each frame is JPEG-encoded and base64-encoded
- Network transmission of potentially large frames
- Multiple clients multiply the bandwidth
Frame Rate
The stream captures every rendered frame by default. For slower connections, reduce frame rate:Troubleshooting
Connection Refused
- Verify
AGENT_BROWSER_STREAM_PORTis set before launching the daemon - Check if the daemon is running:
pgrep -f agent-browser - Verify the port is listening:
lsof -i :9223(macOS/Linux) ornetstat -ano | findstr :9223(Windows)
No Frames Received
Possible causes:- Browser not launched yet - send a command first:
agent-browser open example.com - Screencast failed to start - check for errors in daemon logs
- Multiple clients - frames are broadcast, ensure at least one client is connected