AI Commands

Overview

Skyvern adds AI-powered methods directly on page objects, enabling you to perform actions, extract data, validate page state, and execute complex workflows using natural language.

Core AI Methods

page.act()

Perform actions on the page using natural language.

await page.act("Click the login button and wait for the dashboard")

Parameters:

prompt (str): Natural language description of the action to perform

Returns: None Use Cases:

Single-step actions with natural language
Interactions where element location is ambiguous
Quick prototyping without finding selectors

Examples:

# Simple click
await page.act("Click the login button")

# Multi-step action
await page.act("Scroll down to the footer and click the Privacy Policy link")

# Conditional action
await page.act("If there's a popup, close it")

page.extract()

Extract structured data from the page with optional JSON schema.

# Simple extraction
result = await page.extract("Get the product name and price")

# With schema
result = await page.extract(
    prompt="Extract order details",
    schema={
        "order_id": "string",
        "total": "number",
        "items": "array"
    }
)

Parameters:

prompt (str): Description of what data to extract
schema (dict | array | str, optional): JSON schema defining the expected output structure
errorCodeMapping (dict, optional): Map error codes to custom messages (Python only)
intention (str, optional): Additional context for extraction (Python only)
data (str | dict, optional): Additional data context (Python only)

Returns: Record<string, unknown> | unknown[] | string | null - Extracted data matching the schema Complex Schema Example:

products = await page.extract(
    prompt="Extract all products from the page",
    schema={
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "price": {"type": "number"},
                "rating": {"type": "number"},
                "in_stock": {"type": "boolean"},
                "image_url": {"type": "string"},
            },
            "required": ["name", "price"]
        }
    }
)

print(f"Found {len(products)} products")
for product in products:
    print(f"{product['name']}: ${product['price']}")

page.validate()

Validate page state using natural language. Returns boolean.

# Check login state
is_logged_in = await page.validate("Check if the user is logged in")

if is_logged_in:
    print("User is logged in")
else:
    print("User is not logged in")

# Validate form submission
is_submitted = await page.validate("Check if the form was submitted successfully")

Parameters:

prompt (str): Natural language validation question
model (dict | str, optional): LLM model configuration or model name

Returns: boolean - Validation result Use Cases:

# Conditional workflow
if await page.validate("Is there an error message displayed?"):
    error_msg = await page.extract("Get the error message text")
    print(f"Error: {error_msg}")
else:
    await page.act("Click continue")

# Wait for page state
import asyncio

while not await page.validate("Is the data table fully loaded?"):
    await asyncio.sleep(1)

print("Table loaded, proceeding...")

page.prompt()

Send arbitrary prompts to the LLM with optional response schema.

# Simple prompt
summary = await page.prompt("Summarize what's on this page")

# With schema
analysis = await page.prompt(
    "Analyze the pricing tiers on this page",
    schema={
        "tiers": "array",
        "best_value": "string",
        "recommendation": "string"
    }
)

# With specific model
result = await page.prompt(
    "What's the main call-to-action?",
    model="gpt-4"
)

Parameters:

prompt (str): The prompt to send to the LLM
schema (dict, optional): JSON schema for structured response
model (dict | str, optional): LLM model configuration or model name

Returns: Record<string, unknown> | unknown[] | string | null - LLM response

Agent Methods

The page.agent object provides higher-level workflow commands that execute multi-step AI-powered tasks.

page.agent.run_task()

Execute complex multi-step tasks in the context of the current page.

# Simple task
task_result = await page.agent.run_task(
    "Fill out the contact form with: John Doe, [email protected], and message 'Hello'"
)

print(task_result.status)
print(task_result.extracted_information)

Parameters:

prompt (str): Natural language description of the task
engine (RunEngine, optional): Execution engine (default: skyvern_v2)
model (dict, optional): LLM model configuration
url (str, optional): URL to navigate to (defaults to current page URL)
webhookUrl (str, optional): Webhook URL for progress notifications
totpIdentifier (str, optional): TOTP identifier for 2FA
totpUrl (str, optional): URL to fetch TOTP codes
title (str, optional): Human-readable task title
errorCodeMapping (dict, optional): Custom error code mappings
dataExtractionSchema (dict | str, optional): Schema for data extraction
maxSteps (int, optional): Maximum number of steps
timeout (float, optional): Timeout in seconds (default: 1800)

Returns: TaskRunResponse with execution results Advanced Example:

task_result = await page.agent.run_task(
    prompt="Search for 'wireless headphones', filter by price under $100, and add the top-rated item to cart",
    data_extraction_schema={
        "product_name": "string",
        "price": "number",
        "rating": "number",
        "added_to_cart": "boolean"
    },
    max_steps=20,
    timeout=300
)

if task_result.status == "completed":
    product = task_result.extracted_information
    print(f"Added {product['product_name']} to cart for ${product['price']}")

Execute login workflow with stored credentials.

from skyvern.schemas.run_blocks import CredentialType

# Skyvern credentials
workflow_result = await page.agent.login(
    credential_type=CredentialType.skyvern,
    credential_id="cred_123"
)

# Bitwarden credentials
workflow_result = await page.agent.login(
    credential_type=CredentialType.bitwarden,
    bitwarden_item_id="item_id",
    bitwarden_collection_id="collection_id"
)

See Also: Authentication Guide for complete login documentation.

page.agent.download_files()

Execute file download workflow.

workflow_result = await page.agent.download_files(
    prompt="Navigate to the invoices page and download the latest invoice",
    download_suffix="invoice.pdf",
    timeout=300
)

print(f"Download status: {workflow_result.status}")

Parameters:

prompt (str): Instructions for navigating to and downloading the file
url (str, optional): Starting URL (defaults to current page)
downloadSuffix (str, optional): Filename or suffix for downloaded file
downloadTimeout (float, optional): Timeout for download operation
maxStepsPerRun (int, optional): Maximum steps to execute
webhookUrl (str, optional): Webhook notification URL
totpIdentifier (str, optional): TOTP identifier
totpUrl (str, optional): TOTP URL
extraHttpHeaders (dict, optional): Additional HTTP headers
timeout (float, optional): Overall timeout in seconds (default: 1800)

Returns: WorkflowRunResponse

page.agent.run_workflow()

Execute a pre-defined workflow by ID.

workflow_result = await page.agent.run_workflow(
    workflow_id="wkfl_123",
    parameters={
        "search_term": "laptop",
        "max_price": 1500
    },
    timeout=600
)

print(f"Workflow status: {workflow_result.status}")

Parameters:

workflowId (str): ID of the workflow to execute
parameters (dict, optional): Workflow parameters
template (bool, optional): Whether this is a template
title (str, optional): Human-readable title
webhookUrl (str, optional): Webhook notification URL
totpUrl (str, optional): TOTP URL
totpIdentifier (str, optional): TOTP identifier
timeout (float, optional): Timeout in seconds (default: 1800)

Returns: WorkflowRunResponse

Method Comparison

Method	Use Case	Complexity	Returns
`page.act()`	Single actions	Simple	None
`page.extract()`	Data extraction	Simple	Structured data
`page.validate()`	State validation	Simple	Boolean
`page.prompt()`	General AI queries	Simple	Flexible
`page.agent.run_task()`	Multi-step tasks	Complex	TaskRunResponse
`page.agent.login()`	Authentication	Complex	WorkflowRunResponse
`page.agent.download_files()`	File downloads	Complex	WorkflowRunResponse
`page.agent.run_workflow()`	Pre-defined workflows	Complex	WorkflowRunResponse

Best Practices

1. Choose the Right Method

# Simple action - use act()
await page.act("Click the submit button")

# Data extraction - use extract()
data = await page.extract("Get all product prices")

# Complex workflow - use agent.run_task()
await page.agent.run_task("Search, filter, and purchase the top item")

2. Provide Clear Prompts

# Too vague
await page.act("Click button")  # Which button?

# Better
await page.act("Click the blue 'Submit Order' button in the checkout form")

3. Use Schemas for Structured Data

# Without schema - returns unpredictable format
data = await page.extract("Get products")

# With schema - returns predictable structure
products = await page.extract(
    "Get products",
    schema={
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "price": {"type": "number"}
            }
        }
    }
)

4. Combine Methods Strategically

# Check state before action
if await page.validate("Is the form visible?"):
    await page.act("Fill out the form")
    
    # Validate after action
    if await page.validate("Was the form submitted successfully?"):
        result = await page.extract("Get the confirmation number")
        print(f"Success! Confirmation: {result}")

Get Started

Running Tasks

Building Workflows

SDK & API

Credentials & Authentication

Advanced Features

Self-Hosted

Integrations

Debugging & Troubleshooting

Overview

Core AI Methods

page.act()

page.extract()

page.validate()

page.prompt()

Agent Methods

page.agent.run_task()

page.agent.download_files()

page.agent.run_workflow()

Method Comparison

Best Practices

1. Choose the Right Method

2. Provide Clear Prompts

3. Use Schemas for Structured Data

4. Combine Methods Strategically

See Also

Build docs developers (and LLMs) love

Get Started

Running Tasks

Building Workflows

SDK & API

Credentials & Authentication

Advanced Features

Self-Hosted

Integrations

Debugging & Troubleshooting

​Overview

​Core AI Methods

​page.act()

​page.extract()

​page.validate()

​page.prompt()

​Agent Methods

​page.agent.run_task()

​page.agent.login()

​page.agent.download_files()

​page.agent.run_workflow()

​Method Comparison

​Best Practices

​1. Choose the Right Method

​2. Provide Clear Prompts

​3. Use Schemas for Structured Data

​4. Combine Methods Strategically

​See Also

Build docs developers (and LLMs) love

Overview

Core AI Methods

page.act()

page.extract()

page.validate()

page.prompt()

Agent Methods

page.agent.run_task()

page.agent.login()

page.agent.download_files()

page.agent.run_workflow()

Method Comparison

Best Practices

1. Choose the Right Method

2. Provide Clear Prompts

3. Use Schemas for Structured Data

4. Combine Methods Strategically

See Also