Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/alibaba/page-agent/llms.txt

Use this file to discover all available pages before exploring further.

PageAgentCore is the core agent class exported from @page-agent/core. It contains the full ReAct agent loop — observe, think, act — but ships with no built-in UI panel. PageAgent (the top-level package) is simply a thin subclass that wires PageAgentCore together with PageController and the default floating Panel. When you need a different interface, a headless test harness, or integration inside a larger agent system, reach for PageAgentCore directly.

When to Use PageAgentCore

Custom UI

Replace the built-in Panel with your own chat widget, sidebar, or command palette.

Headless Automation

Run the agent without any visual overlay during automated testing or CI pipelines.

Non-Browser Environments

Provide a custom PageController implementation (e.g., Puppeteer or Playwright) for server-side or desktop automation.

Agent Composition

Embed Page Agent as a sub-agent inside a larger orchestration system where the parent manages the UI.

Basic Usage

import { PageAgentCore } from '@page-agent/core'
import { PageController } from '@page-agent/page-controller'

const agent = new PageAgentCore({
  pageController: new PageController({ enableMask: true }),
  baseURL: 'https://api.openai.com/v1',
  apiKey: 'your-api-key',
  model: 'gpt-5.2',
})

// Listen to events for UI display
agent.addEventListener('statuschange', () => {
  console.log('Status:', agent.status)
})

agent.addEventListener('activity', (e) => {
  const activity = (e as CustomEvent).detail
  console.log('Activity:', activity.type)
})

// Execute task
const result = await agent.execute('Fill in the form with test data')
console.log(result.success, result.data)

Configuration Reference

PageAgentCoreConfig is defined as AgentConfig & { pageController: PageController }.

PageController

pageController
PageController
required
A PageController instance responsible for DOM extraction and element interaction. Create one with new PageController({ enableMask: true }).

LLM Config

baseURL
string
required
Base URL of the LLM API endpoint (e.g., https://api.openai.com/v1).
model
string
required
Model name to use (e.g., gpt-5.2, anthropic/claude-4.5-haiku).
apiKey
string
LLM API key. For production use, proxy the key through a backend instead of exposing it in client-side code.
maxRetries
number
default:"3"
Maximum number of retry attempts when an LLM API call fails.
customFetch
typeof fetch
Custom fetch function for injecting headers, credentials, or routing calls through a backend proxy. Use this to keep apiKey off the client.
disableNamedToolChoice
boolean
default:"false"
When true, the agent always sends tool_choice: "required" instead of naming a specific tool. Useful for LLM providers that do not support the object form of tool_choice.
transformRequestBody
(requestBody) => Record<string, unknown> | undefined
Transform the final request body before it is sent to the LLM. Useful for adding provider-specific cache hints or private request parameters.

Agent Config

language
'en-US' | 'zh-CN'
default:"'en-US'"
Language for agent output and panel UI strings.
maxSteps
number
default:"40"
Maximum number of agent steps allowed per task. The agent emits a warning at 5 steps remaining and halts at this limit.
stepDelay
number
default:"0.4"
Seconds to wait between consecutive steps. Useful for rate-limited APIs or to allow the page to settle.
customTools
Record<string, PageAgentTool | null>
Extend or override the built-in tool set. Set a key to null to remove that tool entirely.
import { z } from 'zod/v4'
import { tool } from 'page-agent'

const agent = new PageAgentCore({
  // ...
  customTools: {
    ask_user: null, // disable the ask_user tool
    my_tool: tool({
      description: 'Do something custom',
      inputSchema: z.object({ value: z.string() }),
      execute: async function (input) {
        return `Done: ${input.value}`
      },
    }),
  },
})
instructions
InstructionsConfig
Instructions that guide agent behavior. See the InstructionsConfig type below.
transformPageContent
(content: string) => string | Promise<string>
Transform the simplified page HTML before it is sent to the LLM. Use this to mask sensitive data (e.g., phone numbers, credit card numbers) before they leave the browser.
customSystemPrompt
string
Completely replace the default system prompt. Use with caution — an incorrect prompt can break agent reasoning entirely.
experimentalScriptExecutionTool
boolean
default:"false"
Enable the execute_javascript tool, which lets the agent run arbitrary JS on the page. Can cause unpredictable side effects and may bypass data-masking. Only enable when necessary.
experimentalLlmsTxt
boolean
default:"false"
Fetch /llms.txt from the current site origin and include it as additional context in every step prompt.

Lifecycle Hooks

Lifecycle hooks are highly experimental and their signatures may change in future releases. Errors thrown from hooks propagate out of execute() as external errors. Catch exceptions inside the hook itself if the task should not be interrupted.
onBeforeTask
(agent: PageAgentCore) => void | Promise<void>
Called once before the agent starts the task loop.
onAfterTask
(agent: PageAgentCore, result: ExecutionResult) => void | Promise<void>
Called once after the task completes (success or failure).
onBeforeStep
(agent: PageAgentCore, stepCount: number) => void | Promise<void>
Called before each individual step. stepCount is 0-indexed.
onAfterStep
(agent: PageAgentCore, history: HistoricalEvent[]) => void | Promise<void>
Called after each individual step with the current history snapshot.
onDispose
(agent: PageAgentCore, reason?: string) => void
Called synchronously when dispose() is invoked.

Properties

PropertyTypeDescription
status'idle' | 'running' | 'completed' | 'error' | 'stopped'Current agent execution status.
historyHistoricalEvent[]Persistent history array that forms the agent’s multi-step memory.
taskstringThe task string passed to the current (or most recent) execute() call.
taskIdstringUnique identifier for the current (or most recent) task execution. Reset on each execute() call.
lastResultExecutionResult | nullResult of the most recent run, or null before the first run completes.
disposedbooleantrue after dispose() has been called.
idstringUnique identifier for this agent instance.
configPageAgentCoreConfig & { maxSteps: number }The resolved configuration object, with maxSteps filled in from the default.
toolsMap<string, PageAgentTool>Live map of available tools (may be modified by customTools).
pageControllerPageControllerThe bound PageController instance.
onAskUser(question: string, options?: { signal: AbortSignal }) => Promise<string>Assign this callback to enable the ask_user tool. The promise should reject when options.signal aborts.

Methods

MethodReturnsDescription
execute(task)Promise<ExecutionResult>Start a task. Throws if a task is already running — concurrent execution is not supported.
stop()Promise<void>Gracefully stop the running task and wait for the loop to settle. The agent remains reusable afterwards.
dispose()voidPermanently destroy the agent and clean up all resources including the PageController.
pushObservation(content)voidInject an observation string into history before the next step. Useful for testing or external monitoring hooks. Marked @experimental in source; the API may change.

Events

PageAgentCore extends EventTarget. All standard addEventListener / removeEventListener patterns apply.
EventTypeDescription
statuschangeEventFired on every status transition: idle → running → completed / error / stopped. Read agent.status inside the handler.
historychangeEventFired when agent.history is updated. Read the full array from agent.history.
activityCustomEvent<AgentActivity>Transient real-time feedback for UI components (thinking, executing, executed, retrying, error). Not persisted in history.
disposeEventFired when dispose() is called. Use this to clean up UI components.

Type Definitions

ExecutionResult

interface ExecutionResult {
  success: boolean
  data: string
  history: HistoricalEvent[]
}

AgentActivity

type AgentActivity =
  | { type: 'thinking' }
  | { type: 'executing'; tool: string; input: unknown }
  | { type: 'executed'; tool: string; input: unknown; output: string; duration: number }
  | { type: 'retrying'; attempt: number; maxAttempts: number }
  | { type: 'error'; message: string }

InstructionsConfig

interface InstructionsConfig {
  /** Global system-level instructions applied to every task */
  system?: string

  /**
   * Dynamic per-page instructions callback.
   * Called before each step; return undefined to skip.
   */
  getPageInstructions?: (url: string) => string | undefined | null
}

HistoricalEvent

type HistoricalEvent =
  | { type: 'step'; stepIndex: number; reflection: Partial<AgentReflection>; action: { name: string; input: any; output: string } }
  | { type: 'observation'; content: string }
  | { type: 'user_takeover' }
  | { type: 'retry'; message: string; attempt: number; maxAttempts: number }
  | { type: 'error'; message: string }

Build docs developers (and LLMs) love