Documentation Index
Fetch the complete documentation index at: https://mintlify.com/alibaba/page-agent/llms.txt
Use this file to discover all available pages before exploring further.
PageController from @page-agent/page-controller manages all DOM interaction on behalf of the agent. It handles page-state extraction (building a simplified HTML snapshot for the LLM), interactive element index mapping, synthetic event dispatch, scrolling, and optional visual masking. The class is intentionally independent of the LLM — every public method is async for compatibility with potential remote-calling scenarios, but no network calls are made internally.
PageAgent creates and manages a PageController for you. Instantiate it directly only when building on top of PageAgentCore or when you need raw DOM control without an agent loop.
Import
Constructor
config argument is optional. When enableMask is true the mask overlay is initialised asynchronously via dynamic import to avoid loading CSS in Node environments.
PageControllerConfig
Show a visual overlay that blocks user pointer interaction while the agent is operating.
PageAgent sets this to true by default; when using PageController directly it defaults to false.Controls how far beyond the visible viewport interactive elements are extracted. Pass
-1 to capture the entire page regardless of scroll position.Extra HTML attribute names to include in the simplified DOM string (e.g.
['aria-label', 'placeholder']).Preserve semantic HTML tag names (e.g.
<button>, <a>) in the simplified output instead of normalising everything to generic element descriptors.Elements to exclude from the interactive element index. Useful for hiding agent UI elements from the LLM’s view of the page.
The BrowserState Type
getBrowserState() returns a BrowserState object. All fields are plain strings formatted for direct LLM consumption.
Current page URL (
window.location.href).Current page title (
document.title).Multi-line string containing the page title link, viewport/page dimensions, total scroll depth, and an above-fold scroll hint.Example:
Simplified HTML of all indexed interactive elements, formatted for LLM consumption. Each interactive element is given a numeric index like
[42] that maps to the internal selectorMap.Below-fold scroll hint, or
[End of page] when already at the bottom.Example: ... 1500 pixels below (1.7 pages) - scroll to see more ...Methods
getBrowserState()
updateTree()) and returns the full BrowserState snapshot. This is the method the agent calls at the start of every observe phase. Elements blacklisted via data-page-agent-not-interactive or interactiveBlacklist are excluded from indexing.
clickElement(index)
index: pointerover → pointerenter → mouseover → mouseenter → pointerdown → mousedown → focus → pointerup → mouseup → click. Automatically scrolls the element into view first.
Zero-based interactive-element index from the most recent
getBrowserState() call.{ success: boolean, message: string } — message contains a human-readable result including the element’s text description, or an error string on failure.If the clicked element is an anchor with
target="_blank", the returned message will note that the link opened in a new tab.inputText(index, text)
index then sets its value to text, dispatching the appropriate synthetic input / change events. Supports <input>, <textarea>, and contenteditable elements (including React-controlled inputs and most rich-text editors).
Interactive-element index.
Text to type into the element. Replaces any existing value.
{ success: boolean, message: string }.selectOption(index, text)
<option> whose visible text matches text inside the <select> element at index, then dispatches a change event.
Interactive-element index of the
<select> element.Exact visible text of the option to select (whitespace-trimmed comparison).
{ success: boolean, message: string }.scroll(options)
index it scrolls the page; with index it finds the nearest scrollable ancestor of the indexed element.
{ success: boolean, message: string } with the actual pixels scrolled and whether a boundary (top/bottom) was reached.scrollHorizontally(options)
index it scrolls the page; with index it finds the nearest horizontally scrollable ancestor.
{ success: boolean, message: string }.executeJavascript(script, signal?)
script in an async function scope on the current page. The signal is exposed to the script as a variable named signal, allowing cooperative cancellation of long-running async code.
JavaScript source string. Supports
async/await. Use signal.throwIfAborted() or pass signal to fetch() to honour cancellation.Optional abort signal forwarded into the script scope.
{ success: true, message: '✅ Executed JavaScript. Result: <returnValue>' } or { success: false, message: '❌ Error executing JavaScript: <error>' }.showMask()
enableMask was false at construction.
hideMask()
cleanUpHighlights()
updateTree() call. Called automatically at the end of every agent task.
getLastUpdateTime()
updateTree() call. The wait built-in tool uses this to subtract LLM call time from the requested wait duration.
Date.now() value from the last DOM tree refresh, or 0 if the tree has never been updated.dispose()
PageAgentCore.dispose().
Events
PageController extends EventTarget.
| Event | Type | When fired |
|---|---|---|
beforeUpdate | Event | At the start of updateTree(), before DOM extraction begins. |
afterUpdate | Event | At the end of updateTree(), after the selector map is rebuilt. |
Example
Related
PageAgent
Full agent with built-in panel — manages a PageController automatically.
PageAgentCore
Headless agent base class that accepts a PageController in its constructor.