Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/alibaba/page-agent/llms.txt

Use this file to discover all available pages before exploring further.

PageController from @page-agent/page-controller manages all DOM interaction on behalf of the agent. It handles page-state extraction (building a simplified HTML snapshot for the LLM), interactive element index mapping, synthetic event dispatch, scrolling, and optional visual masking. The class is intentionally independent of the LLM — every public method is async for compatibility with potential remote-calling scenarios, but no network calls are made internally. PageAgent creates and manages a PageController for you. Instantiate it directly only when building on top of PageAgentCore or when you need raw DOM control without an agent loop.

Import

import { PageController } from '@page-agent/page-controller'

Constructor

new PageController(config?: PageControllerConfig)
The config argument is optional. When enableMask is true the mask overlay is initialised asynchronously via dynamic import to avoid loading CSS in Node environments.

PageControllerConfig

enableMask
boolean
default:"false"
Show a visual overlay that blocks user pointer interaction while the agent is operating. PageAgent sets this to true by default; when using PageController directly it defaults to false.
viewportExpansion
number | -1
Controls how far beyond the visible viewport interactive elements are extracted. Pass -1 to capture the entire page regardless of scroll position.
includeAttributes
string[]
Extra HTML attribute names to include in the simplified DOM string (e.g. ['aria-label', 'placeholder']).
keepSemanticTags
boolean
Preserve semantic HTML tag names (e.g. <button>, <a>) in the simplified output instead of normalising everything to generic element descriptors.
interactiveBlacklist
Element[]
Elements to exclude from the interactive element index. Useful for hiding agent UI elements from the LLM’s view of the page.

The BrowserState Type

getBrowserState() returns a BrowserState object. All fields are plain strings formatted for direct LLM consumption.
url
string
Current page URL (window.location.href).
title
string
Current page title (document.title).
header
string
Multi-line string containing the page title link, viewport/page dimensions, total scroll depth, and an above-fold scroll hint.Example:
Current Page: [My App](https://example.com/app)
Page info: 1440x900px viewport, 1440x2400px total page size, 0.0 pages above, 1.7 pages below, 2.7 total pages, at 0% of page

Interactive elements from top layer of the current page inside the viewport:

[Start of page]
content
string
Simplified HTML of all indexed interactive elements, formatted for LLM consumption. Each interactive element is given a numeric index like [42] that maps to the internal selectorMap.
Below-fold scroll hint, or [End of page] when already at the bottom.Example: ... 1500 pixels below (1.7 pages) - scroll to see more ...

Methods

getBrowserState()

controller.getBrowserState(): Promise<BrowserState>
Refreshes the DOM index (updateTree()) and returns the full BrowserState snapshot. This is the method the agent calls at the start of every observe phase. Elements blacklisted via data-page-agent-not-interactive or interactiveBlacklist are excluded from indexing.

clickElement(index)

controller.clickElement(index: number): Promise<ActionResult>
Simulates a full W3C-compliant click sequence on the element at index: pointeroverpointerentermouseovermouseenterpointerdownmousedownfocuspointerupmouseupclick. Automatically scrolls the element into view first.
index
number
required
Zero-based interactive-element index from the most recent getBrowserState() call.
returns
ActionResult
{ success: boolean, message: string }message contains a human-readable result including the element’s text description, or an error string on failure.
If the clicked element is an anchor with target="_blank", the returned message will note that the link opened in a new tab.

inputText(index, text)

controller.inputText(index: number, text: string): Promise<ActionResult>
Clicks the element at index then sets its value to text, dispatching the appropriate synthetic input / change events. Supports <input>, <textarea>, and contenteditable elements (including React-controlled inputs and most rich-text editors).
index
number
required
Interactive-element index.
text
string
required
Text to type into the element. Replaces any existing value.
returns
ActionResult
{ success: boolean, message: string }.

selectOption(index, text)

controller.selectOption(index: number, text: string): Promise<ActionResult>
Selects the <option> whose visible text matches text inside the <select> element at index, then dispatches a change event.
index
number
required
Interactive-element index of the <select> element.
text
string
required
Exact visible text of the option to select (whitespace-trimmed comparison).
returns
ActionResult
{ success: boolean, message: string }.

scroll(options)

controller.scroll(options: {
  down: boolean
  numPages: number
  pixels?: number
  index?: number
}): Promise<ActionResult>
Scrolls vertically. Without index it scrolls the page; with index it finds the nearest scrollable ancestor of the indexed element.
options
object
required
returns
ActionResult
{ success: boolean, message: string } with the actual pixels scrolled and whether a boundary (top/bottom) was reached.

scrollHorizontally(options)

controller.scrollHorizontally(options: {
  right: boolean
  pixels: number
  index?: number
}): Promise<ActionResult>
Scrolls horizontally. Without index it scrolls the page; with index it finds the nearest horizontally scrollable ancestor.
options
object
required
returns
ActionResult
{ success: boolean, message: string }.

executeJavascript(script, signal?)

controller.executeJavascript(script: string, signal?: AbortSignal): Promise<ActionResult>
Evaluates script in an async function scope on the current page. The signal is exposed to the script as a variable named signal, allowing cooperative cancellation of long-running async code.
script
string
required
JavaScript source string. Supports async/await. Use signal.throwIfAborted() or pass signal to fetch() to honour cancellation.
signal
AbortSignal
Optional abort signal forwarded into the script scope.
returns
ActionResult
{ success: true, message: '✅ Executed JavaScript. Result: <returnValue>' } or { success: false, message: '❌ Error executing JavaScript: <error>' }.
executeJavascript is disabled by default in PageAgentCore. Enable it via experimentalScriptExecutionTool: true. It can have unpredictable side effects and may bypass data-masking mechanisms.

showMask()

controller.showMask(): Promise<void>
Displays the visual mask overlay, blocking pointer events from reaching the page. Waits for the mask to finish initialising before showing. No-op if enableMask was false at construction.

hideMask()

controller.hideMask(): Promise<void>
Hides the visual mask overlay and restores pointer events to the page.

cleanUpHighlights()

controller.cleanUpHighlights(): Promise<void>
Removes all element highlight overlays injected during the previous updateTree() call. Called automatically at the end of every agent task.

getLastUpdateTime()

controller.getLastUpdateTime(): Promise<number>
Returns the Unix timestamp (milliseconds) of the most recent updateTree() call. The wait built-in tool uses this to subtract LLM call time from the requested wait duration.
returns
Promise<number>
Date.now() value from the last DOM tree refresh, or 0 if the tree has never been updated.

dispose()

controller.dispose(): void
Cleans up all internal state: removes highlight overlays, clears the element index and selector maps, and disposes the mask overlay. Called automatically by PageAgentCore.dispose().

Events

PageController extends EventTarget.
EventTypeWhen fired
beforeUpdateEventAt the start of updateTree(), before DOM extraction begins.
afterUpdateEventAt the end of updateTree(), after the selector map is rebuilt.

Example

import { PageController } from '@page-agent/page-controller'

const controller = new PageController({ enableMask: true })

// Get a snapshot of the current page
const state = await controller.getBrowserState()
console.log(state.url, state.title)
console.log(state.content) // simplified HTML for LLM

// Interact with indexed elements
await controller.clickElement(3)
await controller.inputText(5, 'Hello, world!')
await controller.selectOption(7, 'Option B')

// Scroll down half a viewport
await controller.scroll({ down: true, numPages: 0.5 })

// Execute JavaScript (only safe in controlled environments)
await controller.executeJavascript(`
  const el = document.querySelector('#banner')
  if (el) el.remove()
  return 'removed'
`)

// Clean up
controller.dispose()
If you are integrating PageController with a custom agent, always call getBrowserState() (or updateTree()) before attempting any index-based action. Actions called before the DOM has been indexed throw 'DOM tree not indexed yet.'.

PageAgent

Full agent with built-in panel — manages a PageController automatically.

PageAgentCore

Headless agent base class that accepts a PageController in its constructor.

Build docs developers (and LLMs) love