Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/alibaba/page-agent/llms.txt

Use this file to discover all available pages before exploring further.

@page-agent/mcp is a Node.js MCP (Model Context Protocol) server that bridges AI agent clients — such as Claude Desktop, GitHub Copilot, or Cursor — to your browser through the Page Agent extension. Once connected, your agent client can issue natural-language browser tasks and receive structured results without any additional coding.
This feature is in Beta. The WebSocket protocol, environment variable names, and MCP tool signatures may change in future releases.

Prerequisites

Before starting, make sure you have:
  • Node.js ≥ 20 installed
  • The Page Agent Extension installed in Chrome
  • An OpenAI-compatible LLM API key (or the free testing endpoint for evaluation)

Architecture

┌──────────────┐  stdio   ┌──────────────────┐  WebSocket   ┌──────────────┐
│ Claude /     │◄────────►│ @page-agent/mcp  │◄────────────►│ Hub tab      │
│ Copilot      │  (MCP)   │ (Node.js)        │  (localhost) │ (extension)  │
└──────────────┘          └──────────────────┘              └──────┬───────┘
                                   │                               │
                                   │ HTTP                          │ useAgent
                                   ▼                               ▼
                          ┌──────────────────┐              ┌──────────────┐
                          │ Launcher page    │              │ MultiPage    │
                          │ (localhost:PORT) │              │ Agent        │
                          └──────────────────┘              └──────────────┘
The MCP server communicates with the agent client over stdio using the standard MCP protocol. It simultaneously runs an HTTP + WebSocket server on localhost that the Page Agent extension’s hub tab connects to. Tasks flow: agent client → MCP server → WebSocket → hub tab → MultiPage Agent → browser.

How It Works

1

Agent client starts the MCP server

The client runs npx @page-agent/mcp via stdio when it needs to call a browser tool.
2

MCP server starts HTTP + WebSocket and opens the launcher

The server binds to localhost:PORT (default 38401) and opens a launcher page (localhost:PORT) in your default browser.
3

Launcher page triggers the extension to open a hub tab

The launcher page detects the Page Agent extension and asks it to open a hub tab (hub.html?ws=PORT). You will see a connection approval prompt in the browser.
4

Hub connects and tasks flow

The hub tab establishes a WebSocket connection back to the MCP server. MCP tool calls are now proxied to the hub, which runs the MultiPage Agent in the browser.

Configuration

Claude Desktop

Add the following to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or the equivalent path on your platform:
{
    "mcpServers": {
        "page-agent": {
            "command": "npx",
            "args": ["-y", "@page-agent/mcp"],
            "env": {
                "LLM_BASE_URL": "https://dashscope.aliyuncs.com/compatible-mode/v1",
                "LLM_API_KEY": "sk-xxx",
                "LLM_MODEL_NAME": "qwen3.5-plus"
            }
        }
    }
}

Cursor / GitHub Copilot

Use the same JSON format in your client’s MCP settings panel.
You can use the free testing API (LLM_BASE_URL: https://page-ag-testing-ohftxirgbn.cn-shanghai.fcapp.run, LLM_MODEL_NAME: qwen3.5-plus) for initial evaluation. Switch to a production API key before handling any real data.

MCP Tools

ToolInputDescription
execute_task{ task: string }Execute a browser task described in natural language. Blocking — resolves when the task completes or fails.
get_statusReturns { connected: boolean, busy: boolean } — whether the hub is connected and whether a task is running.
stop_taskSend a stop signal to the currently running task.

Environment Variables

VariableDefaultDescription
LLM_BASE_URLOpenAI-compatible LLM API base URL
LLM_API_KEYLLM API key
LLM_MODEL_NAMEModel name (e.g. qwen3.5-plus, gpt-5.2)
PORT38401HTTP + WebSocket port for the hub bridge
All three LLM_* variables are required. If any is missing, execute_task calls will fail with a configuration error.

Build docs developers (and LLMs) love