Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/alibaba/page-agent/llms.txt

Use this file to discover all available pages before exploring further.

PageAgentCore from @page-agent/core is the headless base class that powers the entire Re-Act agent loop. It owns the LLM client, tool registry, and history stream, but imposes no UI — making it the right foundation for custom-panel implementations, testing harnesses, or server-side orchestration. PageAgent extends this class by adding a floating panel on top.

Import

import { PageAgentCore } from '@page-agent/core'

Constructor

new PageAgentCore(config: PageAgentCoreConfig)
PageAgentCoreConfig is an AgentConfig plus a pageController field:
type PageAgentCoreConfig = AgentConfig & {
  pageController: PageController
}
You are responsible for creating and providing a PageController instance. This separation means you can swap out, mock, or extend the DOM layer independently of the agent logic.

LLM Connection

baseURL
string
required
OpenAI-compatible LLM API base URL, e.g. https://api.openai.com/v1.
model
string
required
Model name to use, e.g. gpt-4o or qwen-max.
apiKey
string
API key for the LLM provider.

Agent Behaviour

pageController
PageController
required
A PageController instance that the agent will use for all DOM queries and browser actions.
language
'en-US' | 'zh-CN'
default:"'en-US'"
Sets the default working language in the agent system prompt.
maxSteps
number
default:"40"
Maximum number of Re-Act loop iterations. When the limit is reached the agent sets success: false and returns.
stepDelay
number
default:"0.4"
Pause in seconds between consecutive steps to allow pages to stabilise.
customTools
Record<string, PageAgentTool | null>
Add, override, or remove built-in tools.
instructions
object
Instructions injected into the LLM prompt on every step.
transformPageContent
(content: string) => Promise<string> | string
Transform the simplified DOM string before it is sent to the LLM. Runs after DOM extraction.
customSystemPrompt
string
Completely replace the default system prompt. Incorrect prompts will break agent behaviour.

Experimental Options

experimentalScriptExecutionTool
boolean
default:"false"
Activate the execute_javascript built-in tool. Can bypass safety guards and data-masking.
experimentalLlmsTxt
boolean
default:"false"
Fetch /llms.txt from the current origin once per task and include it in the LLM context.

Lifecycle Hooks

onBeforeTask
(agent: PageAgentCore) => Promise<void> | void
Called once before the task loop begins. Throwing here cancels the task.
onAfterTask
(agent: PageAgentCore, result: ExecutionResult) => Promise<void> | void
Called after the task loop ends regardless of outcome.
onBeforeStep
(agent: PageAgentCore, stepCount: number) => Promise<void> | void
Called before each step. stepCount is 0-indexed.
onAfterStep
(agent: PageAgentCore, history: HistoricalEvent[]) => Promise<void> | void
Called after each step with the current history snapshot.
onDispose
(agent: PageAgentCore, reason?: string) => void
Called synchronously during dispose().

Properties

id
string
Unique identifier for this agent instance, generated at construction.
config
PageAgentCoreConfig & { maxSteps: number }
The resolved configuration object with defaults applied (maxSteps guaranteed to be a number).
tools
Map<string, PageAgentTool>
Live registry of all active tools. Modified at construction time according to customTools and experimentalScriptExecutionTool. Do not mutate this map at runtime.
pageController
PageController
The PageController instance used for all DOM operations.
task
string
The task string passed to the most recent execute() call. Empty string before the first call.
taskId
string
A unique ID generated at the start of each execute() call. Useful for correlating logs across lifecycle hooks.
history
HistoricalEvent[]
The persistent event log for the current task. Reset to [] at the start of every execute() call. Populated during the run with step, observation, user_takeover, retry, and error events.
disposed
boolean
true after dispose() has been called.
onAskUser
(question: string, options?: { signal: AbortSignal }) => Promise<string>
Callback used by the ask_user built-in tool. The ask_user tool is disabled when this property is unset. The implementation must reject the promise when options.signal is aborted.
status
AgentStatus
Current lifecycle status (getter). One of: 'idle' | 'running' | 'completed' | 'error' | 'stopped'.
lastResult
ExecutionResult | null
Result of the most recently completed run (getter). null before the first run finishes.

Methods

execute(task)

agent.execute(task: string): Promise<ExecutionResult>
Runs the full Re-Act loop for the given task. Each call resets history, taskId, and all internal step state. Throws (not caught internally) when:
  • agent.disposed === true
  • agent.status === 'running'
  • task is an empty string
All other errors (LLM failures, tool errors, maxSteps exceeded) are caught, recorded as error history events, and returned as ExecutionResult with success: false.

stop()

agent.stop(): Promise<void>
Aborts the running task by triggering the internal AbortController, then waits until the run has fully settled (including all onAfterStep / onAfterTask hooks). A no-op if the agent is not running.
Do not await agent.stop() from inside a lifecycle hook. The hook runs as part of the step settlement that stop() is waiting for — awaiting it will deadlock.

dispose()

agent.dispose(): void
Marks the instance as disposed, aborts any running task, calls pageController.dispose(), and emits the 'dispose' event. After disposal the instance is inert and cannot be reused.

pushObservation(content)

agent.pushObservation(content: string): void
Queues a string to be inserted into <agent_history> as an observation event before the next step begins. The observation persists in the LLM context for all subsequent steps within the same task.
content
string
required
Observation text to make visible in the agent’s memory.
Observations queued via pushObservation are batched and emitted as a single historychange event at the start of the next step’s observe phase.

Events

PageAgentCore extends EventTarget. Subscribe with agent.addEventListener(type, handler).
EventTypeDescription
statuschangeEventstatus changed. Read agent.status in the handler.
historychangeEventhistory was mutated (new event pushed).
activityCustomEvent<AgentActivity>Transient real-time feedback. NOT stored in history.
disposeEventAgent has been disposed.

AgentActivity shapes

type AgentActivity =
  | { type: 'thinking' }
  | { type: 'executing'; tool: string; input: unknown }
  | { type: 'executed'; tool: string; input: unknown; output: string; duration: number }
  | { type: 'retrying'; attempt: number; maxAttempts: number }
  | { type: 'error'; message: string }

Re-Act Agent Loop

At each step the agent follows an observe → think → act cycle. The loop repeats until the done tool is called, the task is stopped, or maxSteps is exceeded.
for step in 0..maxSteps:
    observe:
        browserState = pageController.getBrowserState()
        flush pending observations → history

    think:
        messages = [systemPrompt, userPrompt(history, browserState)]
        llmResponse = llm.invoke(messages, tools)  // forced tool call: AgentOutput

    act:
        extract reflection (evaluation, memory, next_goal)
        extract action (toolName + toolInput)
        result = tools[toolName].execute(toolInput)
        history.push({ type:'step', reflection, action, result })

        if toolName === 'done':
            break
The AgentOutput macro-tool wraps all registered tools into a single structured output, enforcing the reflection-before-action mental model. The LLM must provide evaluation_previous_goal, memory, and next_goal before specifying which tool to call.

Building a Custom UI

import { PageAgentCore } from '@page-agent/core'
import { PageController } from '@page-agent/page-controller'

const controller = new PageController({ enableMask: false })

const agent = new PageAgentCore({
  baseURL: 'https://api.openai.com/v1',
  apiKey: process.env.OPENAI_API_KEY!,
  model: 'gpt-4o',
  pageController: controller,
})

// Wire ask_user to your own UI
agent.onAskUser = async (question, { signal } = {}) => {
  return new Promise((resolve, reject) => {
    signal?.addEventListener('abort', () => reject(signal.reason), { once: true })
    resolve(window.prompt(question) ?? '')
  })
}

// Render history on every change
agent.addEventListener('historychange', () => renderHistory(agent.history))
agent.addEventListener('activity', (e) => updateStatusBar((e as CustomEvent).detail))

const result = await agent.execute('Find the cheapest product on this page')
console.log(result.success, result.data)

PageAgent

Ready-to-use subclass with a built-in floating panel — the quickest way to get started.

PageController

DOM extraction and browser interaction layer injected into PageAgentCore.

Build docs developers (and LLMs) love