Documentation Index
Fetch the complete documentation index at: https://mintlify.com/alibaba/page-agent/llms.txt
Use this file to discover all available pages before exploring further.
PageAgentCore is the core agent class exported from @page-agent/core. It contains the full ReAct agent loop — observe, think, act — but ships with no built-in UI panel. PageAgent (the top-level package) is simply a thin subclass that wires PageAgentCore together with PageController and the default floating Panel. When you need a different interface, a headless test harness, or integration inside a larger agent system, reach for PageAgentCore directly.
When to Use PageAgentCore
Custom UI
Replace the built-in Panel with your own chat widget, sidebar, or command palette.
Headless Automation
Run the agent without any visual overlay during automated testing or CI pipelines.
Non-Browser Environments
Provide a custom
PageController implementation (e.g., Puppeteer or Playwright) for server-side or desktop automation.Agent Composition
Embed Page Agent as a sub-agent inside a larger orchestration system where the parent manages the UI.
Basic Usage
Configuration Reference
PageAgentCoreConfig is defined as AgentConfig & { pageController: PageController }.
PageController
A
PageController instance responsible for DOM extraction and element interaction. Create one with new PageController({ enableMask: true }).LLM Config
Base URL of the LLM API endpoint (e.g.,
https://api.openai.com/v1).Model name to use (e.g.,
gpt-5.2, anthropic/claude-4.5-haiku).LLM API key. For production use, proxy the key through a backend instead of exposing it in client-side code.
Maximum number of retry attempts when an LLM API call fails.
Custom fetch function for injecting headers, credentials, or routing calls through a backend proxy. Use this to keep
apiKey off the client.When
true, the agent always sends tool_choice: "required" instead of naming a specific tool. Useful for LLM providers that do not support the object form of tool_choice.Transform the final request body before it is sent to the LLM. Useful for adding provider-specific cache hints or private request parameters.
Agent Config
Language for agent output and panel UI strings.
Maximum number of agent steps allowed per task. The agent emits a warning at 5 steps remaining and halts at this limit.
Seconds to wait between consecutive steps. Useful for rate-limited APIs or to allow the page to settle.
Extend or override the built-in tool set. Set a key to
null to remove that tool entirely.Instructions that guide agent behavior. See the InstructionsConfig type below.
Transform the simplified page HTML before it is sent to the LLM. Use this to mask sensitive data (e.g., phone numbers, credit card numbers) before they leave the browser.
Completely replace the default system prompt. Use with caution — an incorrect prompt can break agent reasoning entirely.
Enable the
execute_javascript tool, which lets the agent run arbitrary JS on the page. Can cause unpredictable side effects and may bypass data-masking. Only enable when necessary.Fetch
/llms.txt from the current site origin and include it as additional context in every step prompt.Lifecycle Hooks
Called once before the agent starts the task loop.
Called once after the task completes (success or failure).
Called before each individual step.
stepCount is 0-indexed.Called after each individual step with the current history snapshot.
Called synchronously when
dispose() is invoked.Properties
| Property | Type | Description |
|---|---|---|
status | 'idle' | 'running' | 'completed' | 'error' | 'stopped' | Current agent execution status. |
history | HistoricalEvent[] | Persistent history array that forms the agent’s multi-step memory. |
task | string | The task string passed to the current (or most recent) execute() call. |
taskId | string | Unique identifier for the current (or most recent) task execution. Reset on each execute() call. |
lastResult | ExecutionResult | null | Result of the most recent run, or null before the first run completes. |
disposed | boolean | true after dispose() has been called. |
id | string | Unique identifier for this agent instance. |
config | PageAgentCoreConfig & { maxSteps: number } | The resolved configuration object, with maxSteps filled in from the default. |
tools | Map<string, PageAgentTool> | Live map of available tools (may be modified by customTools). |
pageController | PageController | The bound PageController instance. |
onAskUser | (question: string, options?: { signal: AbortSignal }) => Promise<string> | Assign this callback to enable the ask_user tool. The promise should reject when options.signal aborts. |
Methods
| Method | Returns | Description |
|---|---|---|
execute(task) | Promise<ExecutionResult> | Start a task. Throws if a task is already running — concurrent execution is not supported. |
stop() | Promise<void> | Gracefully stop the running task and wait for the loop to settle. The agent remains reusable afterwards. |
dispose() | void | Permanently destroy the agent and clean up all resources including the PageController. |
pushObservation(content) | void | Inject an observation string into history before the next step. Useful for testing or external monitoring hooks. Marked @experimental in source; the API may change. |
Events
PageAgentCore extends EventTarget. All standard addEventListener / removeEventListener patterns apply.
| Event | Type | Description |
|---|---|---|
statuschange | Event | Fired on every status transition: idle → running → completed / error / stopped. Read agent.status inside the handler. |
historychange | Event | Fired when agent.history is updated. Read the full array from agent.history. |
activity | CustomEvent<AgentActivity> | Transient real-time feedback for UI components (thinking, executing, executed, retrying, error). Not persisted in history. |
dispose | Event | Fired when dispose() is called. Use this to clean up UI components. |