Skip to main content

Overview

Skyvern adds AI-powered methods directly on page objects, enabling you to perform actions, extract data, validate page state, and execute complex workflows using natural language.

Core AI Methods

page.act()

Perform actions on the page using natural language.
await page.act("Click the login button and wait for the dashboard")
Parameters:
  • prompt (str): Natural language description of the action to perform
Returns: None Use Cases:
  • Single-step actions with natural language
  • Interactions where element location is ambiguous
  • Quick prototyping without finding selectors
Examples:
# Simple click
await page.act("Click the login button")

# Multi-step action
await page.act("Scroll down to the footer and click the Privacy Policy link")

# Conditional action
await page.act("If there's a popup, close it")

page.extract()

Extract structured data from the page with optional JSON schema.
# Simple extraction
result = await page.extract("Get the product name and price")

# With schema
result = await page.extract(
    prompt="Extract order details",
    schema={
        "order_id": "string",
        "total": "number",
        "items": "array"
    }
)
Parameters:
  • prompt (str): Description of what data to extract
  • schema (dict | array | str, optional): JSON schema defining the expected output structure
  • errorCodeMapping (dict, optional): Map error codes to custom messages (Python only)
  • intention (str, optional): Additional context for extraction (Python only)
  • data (str | dict, optional): Additional data context (Python only)
Returns: Record<string, unknown> | unknown[] | string | null - Extracted data matching the schema Complex Schema Example:
products = await page.extract(
    prompt="Extract all products from the page",
    schema={
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "price": {"type": "number"},
                "rating": {"type": "number"},
                "in_stock": {"type": "boolean"},
                "image_url": {"type": "string"},
            },
            "required": ["name", "price"]
        }
    }
)

print(f"Found {len(products)} products")
for product in products:
    print(f"{product['name']}: ${product['price']}")

page.validate()

Validate page state using natural language. Returns boolean.
# Check login state
is_logged_in = await page.validate("Check if the user is logged in")

if is_logged_in:
    print("User is logged in")
else:
    print("User is not logged in")

# Validate form submission
is_submitted = await page.validate("Check if the form was submitted successfully")
Parameters:
  • prompt (str): Natural language validation question
  • model (dict | str, optional): LLM model configuration or model name
Returns: boolean - Validation result Use Cases:
# Conditional workflow
if await page.validate("Is there an error message displayed?"):
    error_msg = await page.extract("Get the error message text")
    print(f"Error: {error_msg}")
else:
    await page.act("Click continue")

# Wait for page state
import asyncio

while not await page.validate("Is the data table fully loaded?"):
    await asyncio.sleep(1)

print("Table loaded, proceeding...")

page.prompt()

Send arbitrary prompts to the LLM with optional response schema.
# Simple prompt
summary = await page.prompt("Summarize what's on this page")

# With schema
analysis = await page.prompt(
    "Analyze the pricing tiers on this page",
    schema={
        "tiers": "array",
        "best_value": "string",
        "recommendation": "string"
    }
)

# With specific model
result = await page.prompt(
    "What's the main call-to-action?",
    model="gpt-4"
)
Parameters:
  • prompt (str): The prompt to send to the LLM
  • schema (dict, optional): JSON schema for structured response
  • model (dict | str, optional): LLM model configuration or model name
Returns: Record<string, unknown> | unknown[] | string | null - LLM response

Agent Methods

The page.agent object provides higher-level workflow commands that execute multi-step AI-powered tasks.

page.agent.run_task()

Execute complex multi-step tasks in the context of the current page.
# Simple task
task_result = await page.agent.run_task(
    "Fill out the contact form with: John Doe, [email protected], and message 'Hello'"
)

print(task_result.status)
print(task_result.extracted_information)
Parameters:
  • prompt (str): Natural language description of the task
  • engine (RunEngine, optional): Execution engine (default: skyvern_v2)
  • model (dict, optional): LLM model configuration
  • url (str, optional): URL to navigate to (defaults to current page URL)
  • webhookUrl (str, optional): Webhook URL for progress notifications
  • totpIdentifier (str, optional): TOTP identifier for 2FA
  • totpUrl (str, optional): URL to fetch TOTP codes
  • title (str, optional): Human-readable task title
  • errorCodeMapping (dict, optional): Custom error code mappings
  • dataExtractionSchema (dict | str, optional): Schema for data extraction
  • maxSteps (int, optional): Maximum number of steps
  • timeout (float, optional): Timeout in seconds (default: 1800)
Returns: TaskRunResponse with execution results Advanced Example:
task_result = await page.agent.run_task(
    prompt="Search for 'wireless headphones', filter by price under $100, and add the top-rated item to cart",
    data_extraction_schema={
        "product_name": "string",
        "price": "number",
        "rating": "number",
        "added_to_cart": "boolean"
    },
    max_steps=20,
    timeout=300
)

if task_result.status == "completed":
    product = task_result.extracted_information
    print(f"Added {product['product_name']} to cart for ${product['price']}")

page.agent.login()

Execute login workflow with stored credentials.
from skyvern.schemas.run_blocks import CredentialType

# Skyvern credentials
workflow_result = await page.agent.login(
    credential_type=CredentialType.skyvern,
    credential_id="cred_123"
)

# Bitwarden credentials
workflow_result = await page.agent.login(
    credential_type=CredentialType.bitwarden,
    bitwarden_item_id="item_id",
    bitwarden_collection_id="collection_id"
)
See Also: Authentication Guide for complete login documentation.

page.agent.download_files()

Execute file download workflow.
workflow_result = await page.agent.download_files(
    prompt="Navigate to the invoices page and download the latest invoice",
    download_suffix="invoice.pdf",
    timeout=300
)

print(f"Download status: {workflow_result.status}")
Parameters:
  • prompt (str): Instructions for navigating to and downloading the file
  • url (str, optional): Starting URL (defaults to current page)
  • downloadSuffix (str, optional): Filename or suffix for downloaded file
  • downloadTimeout (float, optional): Timeout for download operation
  • maxStepsPerRun (int, optional): Maximum steps to execute
  • webhookUrl (str, optional): Webhook notification URL
  • totpIdentifier (str, optional): TOTP identifier
  • totpUrl (str, optional): TOTP URL
  • extraHttpHeaders (dict, optional): Additional HTTP headers
  • timeout (float, optional): Overall timeout in seconds (default: 1800)
Returns: WorkflowRunResponse

page.agent.run_workflow()

Execute a pre-defined workflow by ID.
workflow_result = await page.agent.run_workflow(
    workflow_id="wkfl_123",
    parameters={
        "search_term": "laptop",
        "max_price": 1500
    },
    timeout=600
)

print(f"Workflow status: {workflow_result.status}")
Parameters:
  • workflowId (str): ID of the workflow to execute
  • parameters (dict, optional): Workflow parameters
  • template (bool, optional): Whether this is a template
  • title (str, optional): Human-readable title
  • webhookUrl (str, optional): Webhook notification URL
  • totpUrl (str, optional): TOTP URL
  • totpIdentifier (str, optional): TOTP identifier
  • timeout (float, optional): Timeout in seconds (default: 1800)
Returns: WorkflowRunResponse

Method Comparison

MethodUse CaseComplexityReturns
page.act()Single actionsSimpleNone
page.extract()Data extractionSimpleStructured data
page.validate()State validationSimpleBoolean
page.prompt()General AI queriesSimpleFlexible
page.agent.run_task()Multi-step tasksComplexTaskRunResponse
page.agent.login()AuthenticationComplexWorkflowRunResponse
page.agent.download_files()File downloadsComplexWorkflowRunResponse
page.agent.run_workflow()Pre-defined workflowsComplexWorkflowRunResponse

Best Practices

1. Choose the Right Method

# Simple action - use act()
await page.act("Click the submit button")

# Data extraction - use extract()
data = await page.extract("Get all product prices")

# Complex workflow - use agent.run_task()
await page.agent.run_task("Search, filter, and purchase the top item")

2. Provide Clear Prompts

# Too vague
await page.act("Click button")  # Which button?

# Better
await page.act("Click the blue 'Submit Order' button in the checkout form")

3. Use Schemas for Structured Data

# Without schema - returns unpredictable format
data = await page.extract("Get products")

# With schema - returns predictable structure
products = await page.extract(
    "Get products",
    schema={
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "price": {"type": "number"}
            }
        }
    }
)

4. Combine Methods Strategically

# Check state before action
if await page.validate("Is the form visible?"):
    await page.act("Fill out the form")
    
    # Validate after action
    if await page.validate("Was the form submitted successfully?"):
        result = await page.extract("Get the confirmation number")
        print(f"Success! Confirmation: {result}")

See Also

Build docs developers (and LLMs) love