PageController Class — DOM Operations API Reference

PageController from @page-agent/page-controller manages all DOM interaction on behalf of the agent. It handles page-state extraction (building a simplified HTML snapshot for the LLM), interactive element index mapping, synthetic event dispatch, scrolling, and optional visual masking. The class is intentionally independent of the LLM — every public method is async for compatibility with potential remote-calling scenarios, but no network calls are made internally. PageAgent creates and manages a PageController for you. Instantiate it directly only when building on top of PageAgentCore or when you need raw DOM control without an agent loop.

Import

import { PageController } from '@page-agent/page-controller'

Constructor

new PageController(config?: PageControllerConfig)

The config argument is optional. When enableMask is true the mask overlay is initialised asynchronously via dynamic import to avoid loading CSS in Node environments.

`PageControllerConfig`

enableMask

boolean

default:"false"

Show a visual overlay that blocks user pointer interaction while the agent is operating. PageAgent sets this to true by default; when using PageController directly it defaults to false.

viewportExpansion

number | -1

Controls how far beyond the visible viewport interactive elements are extracted. Pass -1 to capture the entire page regardless of scroll position.

includeAttributes

string[]

Extra HTML attribute names to include in the simplified DOM string (e.g. ['aria-label', 'placeholder']).

keepSemanticTags

boolean

Preserve semantic HTML tag names (e.g. <button>, <a>) in the simplified output instead of normalising everything to generic element descriptors.

interactiveBlacklist

Element[]

Elements to exclude from the interactive element index. Useful for hiding agent UI elements from the LLM’s view of the page.

The `BrowserState` Type

getBrowserState() returns a BrowserState object. All fields are plain strings formatted for direct LLM consumption.

url

string

Current page URL (window.location.href).

title

string

Current page title (document.title).

header

string

Multi-line string containing the page title link, viewport/page dimensions, total scroll depth, and an above-fold scroll hint.Example:

Current Page: [My App](https://example.com/app)
Page info: 1440x900px viewport, 1440x2400px total page size, 0.0 pages above, 1.7 pages below, 2.7 total pages, at 0% of page

Interactive elements from top layer of the current page inside the viewport:

[Start of page]

content

string

Simplified HTML of all indexed interactive elements, formatted for LLM consumption. Each interactive element is given a numeric index like [42] that maps to the internal selectorMap.

footer

string

Below-fold scroll hint, or [End of page] when already at the bottom.Example: ... 1500 pixels below (1.7 pages) - scroll to see more ...

Methods

`getBrowserState()`

controller.getBrowserState(): Promise<BrowserState>

Refreshes the DOM index (updateTree()) and returns the full BrowserState snapshot. This is the method the agent calls at the start of every observe phase. Elements blacklisted via data-page-agent-not-interactive or interactiveBlacklist are excluded from indexing.

`clickElement(index)`

controller.clickElement(index: number): Promise<ActionResult>

Simulates a full W3C-compliant click sequence on the element at index: pointerover → pointerenter → mouseover → mouseenter → pointerdown → mousedown → focus → pointerup → mouseup → click. Automatically scrolls the element into view first.

index

number

required

Zero-based interactive-element index from the most recent getBrowserState() call.

returns

ActionResult

{ success: boolean, message: string } — message contains a human-readable result including the element’s text description, or an error string on failure.

If the clicked element is an anchor with target="_blank", the returned message will note that the link opened in a new tab.

`inputText(index, text)`

controller.inputText(index: number, text: string): Promise<ActionResult>

Clicks the element at index then sets its value to text, dispatching the appropriate synthetic input / change events. Supports <input>, <textarea>, and contenteditable elements (including React-controlled inputs and most rich-text editors).

index

number

required

Interactive-element index.

text

string

required

Text to type into the element. Replaces any existing value.

returns

ActionResult

{ success: boolean, message: string }.

`selectOption(index, text)`

controller.selectOption(index: number, text: string): Promise<ActionResult>

Selects the <option> whose visible text matches text inside the <select> element at index, then dispatches a change event.

index

number

required

Interactive-element index of the <select> element.

text

string

required

Exact visible text of the option to select (whitespace-trimmed comparison).

returns

ActionResult

{ success: boolean, message: string }.

`scroll(options)`

controller.scroll(options: {
  down: boolean
  numPages: number
  pixels?: number
  index?: number
}): Promise<ActionResult>

Scrolls vertically. Without index it scrolls the page; with index it finds the nearest scrollable ancestor of the indexed element.

options

object

required

Show ScrollOptions fields

down

boolean

required

true to scroll down, false to scroll up.

numPages

number

required

Fraction of the viewport height to scroll. Ignored when pixels is provided.

pixels

number

Exact pixel distance to scroll. Takes priority over numPages.

index

number

Optional element index. When provided, scroll is applied to the nearest scrollable ancestor of that element.

returns

ActionResult

{ success: boolean, message: string } with the actual pixels scrolled and whether a boundary (top/bottom) was reached.

`scrollHorizontally(options)`

controller.scrollHorizontally(options: {
  right: boolean
  pixels: number
  index?: number
}): Promise<ActionResult>

Scrolls horizontally. Without index it scrolls the page; with index it finds the nearest horizontally scrollable ancestor.

options

object

required

Show HScrollOptions fields

right

boolean

required

true to scroll right, false to scroll left.

pixels

number

required

Pixel distance to scroll.

index

number

Optional element index for container-level scrolling.

returns

ActionResult

{ success: boolean, message: string }.

`executeJavascript(script, signal?)`

controller.executeJavascript(script: string, signal?: AbortSignal): Promise<ActionResult>

Evaluates script in an async function scope on the current page. The signal is exposed to the script as a variable named signal, allowing cooperative cancellation of long-running async code.

script

string

required

JavaScript source string. Supports async/await. Use signal.throwIfAborted() or pass signal to fetch() to honour cancellation.

signal

AbortSignal

Optional abort signal forwarded into the script scope.

returns

ActionResult

{ success: true, message: '✅ Executed JavaScript. Result: <returnValue>' } or { success: false, message: '❌ Error executing JavaScript: <error>' }.

executeJavascript is disabled by default in PageAgentCore. Enable it via experimentalScriptExecutionTool: true. It can have unpredictable side effects and may bypass data-masking mechanisms.

`showMask()`

controller.showMask(): Promise<void>

Displays the visual mask overlay, blocking pointer events from reaching the page. Waits for the mask to finish initialising before showing. No-op if enableMask was false at construction.

`hideMask()`

controller.hideMask(): Promise<void>

Hides the visual mask overlay and restores pointer events to the page.

`cleanUpHighlights()`

controller.cleanUpHighlights(): Promise<void>

Removes all element highlight overlays injected during the previous updateTree() call. Called automatically at the end of every agent task.

`getLastUpdateTime()`

controller.getLastUpdateTime(): Promise<number>

Returns the Unix timestamp (milliseconds) of the most recent updateTree() call. The wait built-in tool uses this to subtract LLM call time from the requested wait duration.

returns

Promise<number>

Date.now() value from the last DOM tree refresh, or 0 if the tree has never been updated.

`dispose()`

controller.dispose(): void

Cleans up all internal state: removes highlight overlays, clears the element index and selector maps, and disposes the mask overlay. Called automatically by PageAgentCore.dispose().

Events

PageController extends EventTarget.

Event	Type	When fired
`beforeUpdate`	`Event`	At the start of `updateTree()`, before DOM extraction begins.
`afterUpdate`	`Event`	At the end of `updateTree()`, after the selector map is rebuilt.

Example

import { PageController } from '@page-agent/page-controller'

const controller = new PageController({ enableMask: true })

// Get a snapshot of the current page
const state = await controller.getBrowserState()
console.log(state.url, state.title)
console.log(state.content) // simplified HTML for LLM

// Interact with indexed elements
await controller.clickElement(3)
await controller.inputText(5, 'Hello, world!')
await controller.selectOption(7, 'Option B')

// Scroll down half a viewport
await controller.scroll({ down: true, numPages: 0.5 })

// Execute JavaScript (only safe in controlled environments)
await controller.executeJavascript(`
  const el = document.querySelector('#banner')
  if (el) el.remove()
  return 'removed'
`)

// Clean up
controller.dispose()

If you are integrating PageController with a custom agent, always call getBrowserState() (or updateTree()) before attempting any index-based action. Actions called before the DOM has been indexed throw 'DOM tree not indexed yet.'.

PageAgent

Full agent with built-in panel — manages a PageController automatically.

PageAgentCore

Headless agent base class that accepts a PageController in its constructor.

Core Classes

Types & Config

Extension & MCP

PageController Class — DOM Operations API Reference

Import

Constructor

`PageControllerConfig`

The `BrowserState` Type

Methods

`getBrowserState()`

`clickElement(index)`

`inputText(index, text)`

`selectOption(index, text)`

`scroll(options)`

`scrollHorizontally(options)`

`executeJavascript(script, signal?)`

`showMask()`

`hideMask()`

`cleanUpHighlights()`

`getLastUpdateTime()`

`dispose()`

Events

Example

PageAgent

PageAgentCore

Build docs developers (and LLMs) love

Core Classes

Types & Config

Extension & MCP

Documentation Index

​Import

​Constructor

​PageControllerConfig

​The BrowserState Type

​Methods

​getBrowserState()

​clickElement(index)

​inputText(index, text)

​selectOption(index, text)

​scroll(options)

​scrollHorizontally(options)

​executeJavascript(script, signal?)

​showMask()

​hideMask()

​cleanUpHighlights()

​getLastUpdateTime()

​dispose()

​Events

​Example

​Related

PageAgent

PageAgentCore

Build docs developers (and LLMs) love

Import

Constructor

`PageControllerConfig`

The `BrowserState` Type

Methods

`getBrowserState()`

`clickElement(index)`

`inputText(index, text)`

`selectOption(index, text)`

`scroll(options)`

`scrollHorizontally(options)`

`executeJavascript(script, signal?)`

`showMask()`

`hideMask()`

`cleanUpHighlights()`

`getLastUpdateTime()`

`dispose()`

Events

Example

Related