Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/alibaba/page-agent/llms.txt

Use this file to discover all available pages before exploring further.

Page Agent can be added to any webpage in a single script tag or a one-line npm install. This guide walks you through both integration paths, explains every configuration field, and shows you how to listen to agent events and safely handle API credentials in production.
1

Choose an integration path

Pick the approach that matches your project setup:
The CDN bundle includes a pre-configured demo LLM, so you can evaluate Page Agent without an API key. Just drop one <script> tag into your HTML:
index.html
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.11.0/dist/iife/page-agent.demo.js" crossorigin="true"></script>
Two mirrors are available:
MirrorURL
Global (jsDelivr)https://cdn.jsdelivr.net/npm/page-agent@1.11.0/dist/iife/page-agent.demo.js
China (npmmirror)https://registry.npmmirror.com/page-agent/1.11.0/files/dist/iife/page-agent.demo.js
By default the script auto-initialises a demo agent when it loads. Add ?autoInit=false to the URL to suppress auto-init; you can then create your own instance with new window.PageAgent(...).
The demo CDN bundle uses a free testing LLM API provided by the Page Agent team. It is for technical evaluation only and subject to usage limits. By using it, you agree to the Terms of Use. Do not use it in production.
2

Create an agent instance

Construct a PageAgent with your LLM credentials and preferred language. The agent attaches itself to the current page automatically:
agent.ts
import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'qwen3.5-plus',
  baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
  apiKey: 'YOUR_API_KEY',
  language: 'en-US',
})

Configuration fields

model
string
required
The model name as expected by your LLM provider — e.g. "qwen3.5-plus", "gpt-4o", "llama3.2". Must support structured tool/function calling.
baseURL
string
required
Base URL of the OpenAI-compatible API endpoint. Examples:
  • Alibaba DashScope: https://dashscope.aliyuncs.com/compatible-mode/v1
  • OpenAI: https://api.openai.com/v1
  • Local Ollama: http://localhost:11434/v1
apiKey
string
API key for your LLM provider. Required for cloud providers; can be any non-empty string for local models (Ollama, LM Studio) that don’t check the key.
language
"en-US" | "zh-CN"
default:"\"en-US\""
UI and system-prompt language. Controls the language the built-in panel and agent responses use.
maxSteps
number
default:"40"
Maximum number of Re-Act loop iterations per task. Increase for very long multi-step workflows; lower to cap LLM spend.
stepDelay
number
default:"0.4"
Seconds to wait between steps. Increase this if pages need extra time to settle after an action.
instructions.system
string
Global system-level instructions injected into every LLM prompt. Use this to describe your application, restrict agent scope, or define domain terminology.
3

Execute a task

Call agent.execute() with a natural-language task string. It returns a promise that resolves with an ExecutionResult when the task finishes — whether the agent calls done, hits maxSteps, or encounters an unrecoverable error. The promise only rejects for pre-flight failures (e.g. calling execute() while already running):
agent.ts
const result = await agent.execute('Click the login button')

console.log(result.success) // true | false
console.log(result.data)    // agent's final summary text
console.log(result.history) // full history of every step taken
You can also show the built-in floating panel and let the user type instructions directly:
agent.panel.show()
execute() throws synchronously if the agent has already been disposed, or if a task is already running (status is 'running'). Always check agent.status or await and catch accordingly before calling execute() in concurrent code.
4

Listen to agent events

PageAgent extends EventTarget. Subscribe to events for real-time UI feedback and debugging:
agent-events.ts
// Status transitions: 'idle' → 'running' → 'completed' | 'error' | 'stopped'
agent.addEventListener('statuschange', () => {
  console.log('Agent status:', agent.status)
})

// Transient activity events — ideal for driving a live status indicator
agent.addEventListener('activity', (e: Event) => {
  const activity = (e as CustomEvent).detail
  // activity.type: 'thinking' | 'executing' | 'executed' | 'retrying' | 'error'
  if (activity.type === 'executing') {
    console.log(`Executing tool: ${activity.tool}`, activity.input)
  }
})

// History events — persistent, forms the agent's memory
agent.addEventListener('historychange', () => {
  console.log('History updated:', agent.history)
})
Event types at a glance:
EventWhen it firesPayload
statuschangeAgent status transitionsagent.status
activityReal-time step activity (transient)AgentActivity on e.detail
historychangeHistory array mutatedagent.history
disposeAgent is cleaned up
5

Stop or dispose the agent

Use stop() to cancel a running task gracefully. Use dispose() when you are done with the agent instance entirely (e.g. on component unmount):
agent-lifecycle.ts
// Cancel the current task and wait for it to fully settle
await agent.stop()
console.log('Agent stopped. Status:', agent.status) // 'stopped'

// Tear down the agent and its DOM overlay
agent.dispose()
// After dispose(), calling execute() will throw.
Never await agent.stop() inside a lifecycle hook (onBeforeStep, onAfterStep, etc.) — that would cause a deadlock. Call stop() from outside the agent’s own execution context.

Production: Securing Your API Key

Passing apiKey directly in frontend code means the key is visible in your JavaScript bundle and network requests. For any production deployment, proxy the LLM call through your own backend and use customFetch to intercept requests:
Never expose a real LLM API key in client-side code. Anyone who opens DevTools can read it, copy it, and run up charges on your account.
secure-agent.ts
import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'gpt-4o',
  baseURL: '/api/llm-proxy', // your backend endpoint
  apiKey: 'not-used',        // placeholder — your proxy validates the session
  language: 'en-US',

  // Intercept every LLM fetch and attach a session token instead
  customFetch: async (url, init) => {
    const headers = new Headers(init?.headers)
    headers.set('X-Session-Token', getSessionToken()) // your auth mechanism
    headers.delete('Authorization')                   // remove the placeholder key
    return fetch(url, { ...init, headers })
  },
})
Your backend proxy receives the session token, validates the user, appends the real API key, and forwards the request to the LLM provider. This pattern keeps credentials server-side at all times.

Next Steps

Supported Models

Browse tested LLMs, including the free evaluation API and local model options.

Troubleshooting

Diagnose and fix the most common setup and runtime issues.

Build docs developers (and LLMs) love