Page Agent is a JavaScript/TypeScript library that embeds a GUI agent directly inside your webpage. It reads the DOM, reasons over interactive elements, and executes multi-step tasks in response to natural language instructions — all without screenshots, multimodal models, or backend infrastructure.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/alibaba/page-agent/llms.txt
Use this file to discover all available pages before exploring further.
Quickstart
Add Page Agent to a page in under 5 minutes with a single script tag or npm install.
Models
Works with any OpenAI-compatible LLM — Qwen, GPT, Claude, Gemini, DeepSeek, and local models.
Custom Tools
Extend the agent with your own Zod-typed tools to call business logic and APIs.
API Reference
Full TypeScript API reference for PageAgent, PageAgentCore, and PageController.
How It Works
Page Agent runs entirely in the browser. On every step of a task, it extracts a lightweight text representation of the page’s interactive elements, sends it to an LLM, and executes the returned action (click, type, scroll, wait, etc.). No screenshots. No Python backend. No headless browser.Connect your LLM
Provide any OpenAI-compatible
baseURL, model, and apiKey. Supports Qwen, GPT, Claude, Gemini, local Ollama, and more.Execute tasks
Call
agent.execute("Click the login button") and the agent handles the rest — planning, acting, and retrying until the task is done.Key Features
Zero Backend
Pure in-page JavaScript. Import from CDN or npm — no server-side component required.
Text-Based DOM
Parses interactive elements into a compact text tree. No multimodal LLMs or screenshots needed.
Bring Your Own LLM
Connect any OpenAI-compatible endpoint. Works with cloud providers and local runtimes like Ollama.
Custom Tools
Register Zod-typed tools so the agent can call your APIs, search knowledge bases, or trigger workflows.
Data Masking
Use
transformPageContent to redact PII before content reaches the LLM.Chrome Extension
Optional extension for multi-tab, multi-page automation tasks beyond a single page.
MCP Server
Control your browser from Claude Desktop, Cursor, and other MCP-compatible agent clients.
Lifecycle Hooks
Fine-grained
onBeforeTask, onAfterTask, onBeforeStep, and onAfterStep hooks for observability and control.Use Cases
- SaaS AI Copilot — Ship an in-product AI assistant that operates your UI on behalf of users.
- Smart Form Filling — Turn 20-click workflows into one sentence for ERP, CRM, and admin systems.
- Accessibility — Provide a natural language interface for users who struggle with complex UI.
- Interactive Onboarding — Demonstrate workflows live: “Let me show you how to submit an expense report.”
- Multi-page Automation — Use the Chrome Extension to orchestrate tasks across browser tabs.
Page Agent is designed for client-side web enhancement, not server-side scraping. It targets web developers who want to add AI-powered interaction to their own applications.