Overview
Skyvern adds AI-powered methods directly on page objects, enabling you to perform actions, extract data, validate page state, and execute complex workflows using natural language.Core AI Methods
page.act()
Perform actions on the page using natural language.prompt(str): Natural language description of the action to perform
None
Use Cases:
- Single-step actions with natural language
- Interactions where element location is ambiguous
- Quick prototyping without finding selectors
page.extract()
Extract structured data from the page with optional JSON schema.prompt(str): Description of what data to extractschema(dict | array | str, optional): JSON schema defining the expected output structureerrorCodeMapping(dict, optional): Map error codes to custom messages (Python only)intention(str, optional): Additional context for extraction (Python only)data(str | dict, optional): Additional data context (Python only)
Record<string, unknown> | unknown[] | string | null - Extracted data matching the schema
Complex Schema Example:
page.validate()
Validate page state using natural language. Returns boolean.prompt(str): Natural language validation questionmodel(dict | str, optional): LLM model configuration or model name
boolean - Validation result
Use Cases:
page.prompt()
Send arbitrary prompts to the LLM with optional response schema.prompt(str): The prompt to send to the LLMschema(dict, optional): JSON schema for structured responsemodel(dict | str, optional): LLM model configuration or model name
Record<string, unknown> | unknown[] | string | null - LLM response
Agent Methods
Thepage.agent object provides higher-level workflow commands that execute multi-step AI-powered tasks.
page.agent.run_task()
Execute complex multi-step tasks in the context of the current page.prompt(str): Natural language description of the taskengine(RunEngine, optional): Execution engine (default:skyvern_v2)model(dict, optional): LLM model configurationurl(str, optional): URL to navigate to (defaults to current page URL)webhookUrl(str, optional): Webhook URL for progress notificationstotpIdentifier(str, optional): TOTP identifier for 2FAtotpUrl(str, optional): URL to fetch TOTP codestitle(str, optional): Human-readable task titleerrorCodeMapping(dict, optional): Custom error code mappingsdataExtractionSchema(dict | str, optional): Schema for data extractionmaxSteps(int, optional): Maximum number of stepstimeout(float, optional): Timeout in seconds (default: 1800)
TaskRunResponse with execution results
Advanced Example:
page.agent.login()
Execute login workflow with stored credentials.page.agent.download_files()
Execute file download workflow.prompt(str): Instructions for navigating to and downloading the fileurl(str, optional): Starting URL (defaults to current page)downloadSuffix(str, optional): Filename or suffix for downloaded filedownloadTimeout(float, optional): Timeout for download operationmaxStepsPerRun(int, optional): Maximum steps to executewebhookUrl(str, optional): Webhook notification URLtotpIdentifier(str, optional): TOTP identifiertotpUrl(str, optional): TOTP URLextraHttpHeaders(dict, optional): Additional HTTP headerstimeout(float, optional): Overall timeout in seconds (default: 1800)
WorkflowRunResponse
page.agent.run_workflow()
Execute a pre-defined workflow by ID.workflowId(str): ID of the workflow to executeparameters(dict, optional): Workflow parameterstemplate(bool, optional): Whether this is a templatetitle(str, optional): Human-readable titlewebhookUrl(str, optional): Webhook notification URLtotpUrl(str, optional): TOTP URLtotpIdentifier(str, optional): TOTP identifiertimeout(float, optional): Timeout in seconds (default: 1800)
WorkflowRunResponse
Method Comparison
| Method | Use Case | Complexity | Returns |
|---|---|---|---|
page.act() | Single actions | Simple | None |
page.extract() | Data extraction | Simple | Structured data |
page.validate() | State validation | Simple | Boolean |
page.prompt() | General AI queries | Simple | Flexible |
page.agent.run_task() | Multi-step tasks | Complex | TaskRunResponse |
page.agent.login() | Authentication | Complex | WorkflowRunResponse |
page.agent.download_files() | File downloads | Complex | WorkflowRunResponse |
page.agent.run_workflow() | Pre-defined workflows | Complex | WorkflowRunResponse |
Best Practices
1. Choose the Right Method
2. Provide Clear Prompts
3. Use Schemas for Structured Data
4. Combine Methods Strategically
See Also
- Browser Automation - Mix Playwright with AI methods
- Python SDK - Full Python SDK reference
- TypeScript SDK - Full TypeScript SDK reference