Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/alibaba/page-agent/llms.txt

Use this file to discover all available pages before exploring further.

Page Agent is a JavaScript/TypeScript library that embeds a GUI agent directly inside your webpage. It reads the DOM, reasons over interactive elements, and executes multi-step tasks in response to natural language instructions — all without screenshots, multimodal models, or backend infrastructure.

Quickstart

Add Page Agent to a page in under 5 minutes with a single script tag or npm install.

Models

Works with any OpenAI-compatible LLM — Qwen, GPT, Claude, Gemini, DeepSeek, and local models.

Custom Tools

Extend the agent with your own Zod-typed tools to call business logic and APIs.

API Reference

Full TypeScript API reference for PageAgent, PageAgentCore, and PageController.

How It Works

Page Agent runs entirely in the browser. On every step of a task, it extracts a lightweight text representation of the page’s interactive elements, sends it to an LLM, and executes the returned action (click, type, scroll, wait, etc.). No screenshots. No Python backend. No headless browser.
1

Install or embed

Add one <script> tag from CDN, or npm install page-agent and import PageAgent.
2

Connect your LLM

Provide any OpenAI-compatible baseURL, model, and apiKey. Supports Qwen, GPT, Claude, Gemini, local Ollama, and more.
3

Execute tasks

Call agent.execute("Click the login button") and the agent handles the rest — planning, acting, and retrying until the task is done.
4

Extend and customize

Add custom tools, per-page instructions, data masking, and lifecycle hooks to fit your app’s needs.

Key Features

Zero Backend

Pure in-page JavaScript. Import from CDN or npm — no server-side component required.

Text-Based DOM

Parses interactive elements into a compact text tree. No multimodal LLMs or screenshots needed.

Bring Your Own LLM

Connect any OpenAI-compatible endpoint. Works with cloud providers and local runtimes like Ollama.

Custom Tools

Register Zod-typed tools so the agent can call your APIs, search knowledge bases, or trigger workflows.

Data Masking

Use transformPageContent to redact PII before content reaches the LLM.

Chrome Extension

Optional extension for multi-tab, multi-page automation tasks beyond a single page.

MCP Server

Control your browser from Claude Desktop, Cursor, and other MCP-compatible agent clients.

Lifecycle Hooks

Fine-grained onBeforeTask, onAfterTask, onBeforeStep, and onAfterStep hooks for observability and control.

Use Cases

  • SaaS AI Copilot — Ship an in-product AI assistant that operates your UI on behalf of users.
  • Smart Form Filling — Turn 20-click workflows into one sentence for ERP, CRM, and admin systems.
  • Accessibility — Provide a natural language interface for users who struggle with complex UI.
  • Interactive Onboarding — Demonstrate workflows live: “Let me show you how to submit an expense report.”
  • Multi-page Automation — Use the Chrome Extension to orchestrate tasks across browser tabs.
Page Agent is designed for client-side web enhancement, not server-side scraping. It targets web developers who want to add AI-powered interaction to their own applications.

Build docs developers (and LLMs) love