Skip to main content

Overview

Streaming enables real-time browser viewport preview via WebSocket. This supports “pair browsing” where a human can watch and interact alongside an AI agent, or simply view the browser state as commands are executed.

Quick Start

# Enable streaming on port 9223
export AGENT_BROWSER_STREAM_PORT=9223
agent-browser open example.com

# Connect to ws://localhost:9223 to receive frames
The stream server starts automatically when AGENT_BROWSER_STREAM_PORT is set and remains active for the lifetime of the daemon.

WebSocket Protocol

Connect to ws://localhost:<port> to receive viewport frames and send input events.

Receive Frames

The server sends JPEG frames as JSON messages:
{
  "type": "frame",
  "data": "<base64-encoded-jpeg>",
  "metadata": {
    "deviceWidth": 1280,
    "deviceHeight": 720,
    "pageScaleFactor": 1,
    "offsetTop": 0,
    "scrollOffsetX": 0,
    "scrollOffsetY": 0
  }
}
Frames are broadcast to all connected clients. Decode the base64 data to display the image.

Send Mouse Events

{
  "type": "input_mouse",
  "eventType": "mousePressed",
  "x": 100,
  "y": 200,
  "button": "left",
  "clickCount": 1
}
Event types: mousePressed, mouseReleased, mouseMoved, mouseWheel Buttons: left, right, middle, none

Send Keyboard Events

{
  "type": "input_keyboard",
  "eventType": "keyDown",
  "key": "Enter",
  "code": "Enter"
}
Event types: keyDown, keyUp, char

Send Touch Events

{
  "type": "input_touch",
  "eventType": "touchStart",
  "touchPoints": [{ "x": 100, "y": 200 }]
}
Event types: touchStart, touchEnd, touchMove, touchCancel

Request Status

{
  "type": "status"
}
Response:
{
  "type": "status",
  "connected": true,
  "screencasting": true,
  "viewportWidth": 1280,
  "viewportHeight": 720
}

Error Messages

{
  "type": "error",
  "message": "Browser not launched"
}

Implementation Details

Automatic Screencast

When the first client connects, the stream server automatically starts screencasting (see src/stream-server.ts:203-209):
if (this.clients.size === 1 && !this.isScreencasting) {
  this.startScreencast().catch((error) => {
    console.error('[StreamServer] Failed to start screencast:', error);
    this.sendError(ws, error.message);
  });
}
When the last client disconnects, screencasting stops automatically to conserve resources (see src/stream-server.ts:226-231).

Frame Quality

Default screencast options from src/stream-server.ts:364-370:
await this.browser.startScreencast((frame) => this.broadcastFrame(frame), {
  format: 'jpeg',
  quality: 80,
  maxWidth: 1280,
  maxHeight: 720,
  everyNthFrame: 1,
});
  • Format: JPEG (smaller size, faster transmission)
  • Quality: 80 (good balance between quality and size)
  • Max dimensions: 1280x720 (scales down larger viewports)
  • Frame rate: Every frame (set everyNthFrame: 2 to sample every other frame)

Input Injection

Input events are injected via CDP (Chrome DevTools Protocol). From src/stream-server.ts:244-275:
switch (message.type) {
  case 'input_mouse':
    await this.browser.injectMouseEvent({
      type: message.eventType,
      x: message.x,
      y: message.y,
      button: message.button,
      clickCount: message.clickCount,
      deltaX: message.deltaX,
      deltaY: message.deltaY,
      modifiers: message.modifiers,
    });
    break;
  // ... keyboard and touch events
}
See src/browser.ts:2164-2228 for the CDP input injection implementation.

Security

Localhost Binding

The stream server binds to 127.0.0.1 only to prevent network exposure (from src/stream-server.ts:120-122):
this.wss = new WebSocketServer({
  port: this.port,
  host: '127.0.0.1',  // Localhost only
  // ...
});
Critical: The stream server allows direct input injection (mouse, keyboard, touch) which would be a critical security risk if exposed to the network.

Origin Validation

The stream server rejects cross-origin WebSocket connections from untrusted origins. From src/stream-server.ts:10-30:
export function isAllowedOrigin(origin: string | undefined): boolean {
  // Allow connections with no origin (CLI tools)
  if (!origin) {
    return true;
  }
  // Allow file:// origins (local HTML files)
  if (origin.startsWith('file://')) {
    return true;
  }
  // Allow localhost/loopback origins (browser-based stream viewers)
  const url = new URL(origin);
  const host = url.hostname;
  if (host === 'localhost' || host === '127.0.0.1' || host === '::1' || host === '[::1]') {
    return true;
  }
  return false;
}
Allowed origins:
  • No origin (CLI tools, WebSocket libraries)
  • file:// (local HTML viewers)
  • localhost, 127.0.0.1, ::1 (local web servers)
Rejected origins:
  • All other origins (prevents malicious web pages from connecting)

Stream Port Discovery

The daemon writes the stream port to a file for clients to discover (from src/daemon.ts:365-366):
const streamPortFile = getStreamPortFile();
fs.writeFileSync(streamPortFile, streamPort.toString());
Clients can read ~/.agent-browser/default.stream (or <session>.stream for named sessions) to discover the port.

Programmatic API

For advanced use, control streaming directly via the BrowserManager API:
import { BrowserManager } from 'agent-browser';

const browser = new BrowserManager();
await browser.launch({ headless: true });
await browser.navigate('https://example.com');

// Start screencast
await browser.startScreencast((frame) => {
  // frame.data is base64-encoded JPEG
  console.log('Frame:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight);
}, {
  format: 'jpeg',
  quality: 80,
  maxWidth: 1280,
  maxHeight: 720,
});

// Inject mouse events
await browser.injectMouseEvent({
  type: 'mousePressed',
  x: 100,
  y: 200,
  button: 'left',
});

// Inject keyboard events
await browser.injectKeyboardEvent({
  type: 'keyDown',
  key: 'Enter',
  code: 'Enter',
});

// Stop when done
await browser.stopScreencast();
See the TypeScript definitions in src/browser.ts:50-72 for full API details.

Example: Simple Viewer

Here’s a minimal HTML viewer for the stream:
<!DOCTYPE html>
<html>
<head>
  <title>Browser Stream Viewer</title>
  <style>
    body { margin: 0; background: #000; }
    #viewport { max-width: 100vw; max-height: 100vh; }
  </style>
</head>
<body>
  <img id="viewport" />
  <script>
    const ws = new WebSocket('ws://localhost:9223');
    const img = document.getElementById('viewport');

    ws.onmessage = (event) => {
      const message = JSON.parse(event.data);
      if (message.type === 'frame') {
        img.src = 'data:image/jpeg;base64,' + message.data;
      }
    };

    // Send mouse clicks to the browser
    img.onclick = (event) => {
      const rect = img.getBoundingClientRect();
      const scaleX = message.metadata.deviceWidth / rect.width;
      const scaleY = message.metadata.deviceHeight / rect.height;
      const x = (event.clientX - rect.left) * scaleX;
      const y = (event.clientY - rect.top) * scaleY;

      ws.send(JSON.stringify({
        type: 'input_mouse',
        eventType: 'mousePressed',
        x: x,
        y: y,
        button: 'left',
        clickCount: 1,
      }));

      ws.send(JSON.stringify({
        type: 'input_mouse',
        eventType: 'mouseReleased',
        x: x,
        y: y,
        button: 'left',
      }));
    };
  </script>
</body>
</html>
Open this HTML file in a browser to view and interact with the stream.

Use Cases

AI Agent Monitoring

Watch an AI agent in real-time as it navigates and interacts with pages:
# Start agent with streaming enabled
export AGENT_BROWSER_STREAM_PORT=9223
agent-browser open example.com

# AI agent runs commands
agent-browser snapshot
agent-browser click @e1
agent-browser fill @e2 "text"

# Human watches via stream viewer
open viewer.html

Pair Browsing

Human and AI collaborate - AI automates, human observes and corrects:
# AI agent
agent-browser --headed open app.com
agent-browser click @e1

# Human watches stream and can take over if needed
# (close AI agent, use headed mode manually)

Debugging

Debug automation issues by watching the actual browser state:
export AGENT_BROWSER_STREAM_PORT=9223
export AGENT_BROWSER_HEADED=0  # Headless mode

# Run automation
./my-automation-script.sh

# Watch stream to see what's happening
open viewer.html

Recording

Record browser sessions by capturing stream frames:
const ws = new WebSocket('ws://localhost:9223');
const frames = [];

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  if (message.type === 'frame') {
    frames.push(message.data);
  }
};

// Later: encode frames as video using ffmpeg or similar

Limitations

iOS Not Supported

Streaming requires CDP (Chrome DevTools Protocol), which is not available for iOS Safari automation. From src/ios-actions.ts:259:
return errorResponse(id, 'Screencast is not supported on iOS (requires CDP).');

Performance

Streaming adds overhead:
  • Each frame is JPEG-encoded and base64-encoded
  • Network transmission of potentially large frames
  • Multiple clients multiply the bandwidth
For production automation, disable streaming unless you need live preview.

Frame Rate

The stream captures every rendered frame by default. For slower connections, reduce frame rate:
await browser.startScreencast((frame) => { ... }, {
  everyNthFrame: 2,  // Capture every 2nd frame (halves frame rate)
});

Troubleshooting

Connection Refused

WebSocket connection to 'ws://localhost:9223/' failed
Solutions:
  1. Verify AGENT_BROWSER_STREAM_PORT is set before launching the daemon
  2. Check if the daemon is running: pgrep -f agent-browser
  3. Verify the port is listening: lsof -i :9223 (macOS/Linux) or netstat -ano | findstr :9223 (Windows)

No Frames Received

Possible causes:
  1. Browser not launched yet - send a command first: agent-browser open example.com
  2. Screencast failed to start - check for errors in daemon logs
  3. Multiple clients - frames are broadcast, ensure at least one client is connected

Origin Blocked

[StreamServer] Rejected connection from origin: https://example.com
The stream server only accepts connections from localhost and file:// origins. Host your viewer on localhost or use a file:// URL.

Build docs developers (and LLMs) love