The Page Agent Chrome extension injectsDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/alibaba/page-agent/llms.txt
Use this file to discover all available pages before exploring further.
window.PAGE_AGENT_EXT into pages that have been authorized with a valid user token. This API lets your page JavaScript trigger multi-tab browser automation tasks without any server-side component — the extension handles navigation, tab management, and agent execution entirely in the browser.
Setup
1. Install the Extension
Install the Page Agent extension from the Chrome Web Store: Page Agent Ext — Chrome Web Store For the latest pre-release builds, check GitHub Releases directly.2. Install Type Definitions (Recommended)
3. Set the Auth Token
The extension requires an explicit authorization token before it injectswindow.PAGE_AGENT_EXT. This prevents untrusted pages from silently triggering broad browser automation.
- Open the extension side panel in Chrome.
- Copy your auth token from the panel.
- Set it in your page’s
localStorage:
Checking for Injection
window.PAGE_AGENT_EXT is injected asynchronously after the page loads. Poll for it before use:
window.PAGE_AGENT_EXT_VERSION
A string containing the currently installed extension version. Check this before calling the main API if your code depends on capabilities added in a specific version.
window.PAGE_AGENT_EXT.execute(task, config)
Starts a new agent task. Returns a Promise that resolves when the task completes or rejects on a fatal error.
Parameters
Natural language description of the task for the agent to complete. Be specific — include the exact steps, target elements, and any data the agent should retrieve or enter.
LLM settings, tab scope options, and event callbacks. See ExecuteConfig below.
Returns
Promise<ExecutionResult>
true if the agent finished the task successfully.The agent’s final response text.
Full ordered list of all events (steps, observations, errors) that occurred during the task.
ExecuteConfig
Base URL of the OpenAI-compatible LLM API. See LLMConfig for provider examples.
Model identifier as accepted by the provider.
LLM API key. Omit for local runtimes or when using a proxy.
Global system-level instructions for the agent. Equivalent to
AgentConfig.instructions.system. Applied to every step of the task.When
true, the tab where your page JavaScript is running is included in the agent’s tab scope. Set to false if the agent should operate only on newly opened tabs.When
true, the agent can see and interact with every unpinned tab in the window, rather than only the tabs it opens itself. Use carefully — the agent may navigate tabs you expect to remain untouched.Called whenever the agent’s lifecycle status changes. Use this to update loading indicators or enable/disable UI controls.
Called for each ephemeral activity event — thinking, executing a tool, retrying, etc. Use this for real-time progress display.
Called after each step with the full history array. Use this to stream completed steps into a log or timeline UI.
window.PAGE_AGENT_EXT.stop()
Sends a stop signal to the currently running task. The agent finishes its current tool call before halting; execute() resolves with success: false.
Full Example
ExecuteConfig TypeScript Type
Window Type Declarations
If you are not importing@page-agent/core as a dependency, add these declarations to a .d.ts file in your project to get IDE autocomplete:
Install
@page-agent/core as a dev dependency to get the complete, maintained type definitions for AgentStatus, AgentActivity, ExecutionResult, and HistoricalEvent: