Overview
Skyvern can extract structured data from web pages during task execution. By providing adata_extraction_schema, you ensure that Skyvern returns data in a consistent, predictable format that matches your application’s requirements.
Basic Data Extraction
The simplest way to extract data is to include it in your prompt:data_extraction_schema.
Using JSON Schema for Structured Extraction
Simple Schema Example
Define the exact structure you want using JSON Schema:Complex Schema with Nested Objects
Extract more complex, nested data structures:Extracting Arrays of Data
Extract lists of items from a page:Real-World Examples
Example 1: E-Commerce Price Extraction
Example 2: Invoice Data Extraction
Example 3: Job Listing Extraction
Example 4: Insurance Quote Extraction
From the README, here’s a real example of extracting insurance quote data:Data Types Supported
Skyvern’s data extraction supports all standard JSON Schema types:| Type | Description | Example |
|---|---|---|
string | Text data | "Hello World" |
integer | Whole numbers | 42 |
number | Floating point numbers | 3.14 |
boolean | True/false values | true |
array | Lists of items | [1, 2, 3] |
object | Nested structures | {"key": "value"} |
null | Null values | null |
Best Practices
1. Always Include Descriptions
Provide clear descriptions for each field to help Skyvern understand what to extract:2. Use Specific Field Names
Use descriptive, unambiguous field names:3. Validate Data Types
Use appropriate data types to ensure correct parsing:4. Handle Missing Data
Plan for fields that might not always be present:5. Keep Schemas Focused
Extract only what you need. Overly complex schemas can reduce accuracy:Accessing Extracted Data
Python SDK
TypeScript SDK
REST API
After creating a task, poll for completion and retrieve the output:Next Steps
Task Parameters
Learn about all available task configuration options
Monitoring Runs
Monitor task execution and view results