Documentation Index Fetch the complete documentation index at: https://mintlify.com/browserbase/stagehand/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The extract() method extracts structured data from the current page using AI. It can return page text, answer questions, or extract data into custom schemas.
Syntax
// Get page text
const { pageText } = await stagehand . extract ();
// Answer a question
const { extraction } = await stagehand . extract ( "What is the article about?" );
// Extract with custom schema
const data = await stagehand . extract ( instruction , schema , options ? );
Overloads
await stagehand . extract ();
await stagehand . extract ( options ? );
returns
Promise<{ pageText: string }>
Object containing the full page text
await stagehand . extract ( instruction , options ? );
Question or description of what to extract
returns
Promise<{ extraction: string }>
Object containing the extracted string
await stagehand . extract ( instruction , schema , options ? );
Description of what to extract
schema
StagehandZodSchema
required
Zod schema defining the structure of extracted data Example: import { z } from "zod" ;
const schema = z . object ({
title: z . string (),
price: z . string (),
inStock: z . boolean (),
});
Extracted data matching the schema type
Options
Override the default model Format: "provider/model" or { modelName, ...clientOptions }
Maximum time to wait (milliseconds)
CSS selector to scope extraction to a specific element
page
Page | PlaywrightPage | PuppeteerPage | PatchrightPage
Page to extract from (defaults to active page)
Examples
import { Stagehand } from "@browserbasehq/stagehand" ;
const stagehand = new Stagehand ({ env: "LOCAL" });
await stagehand . init ();
const page = await stagehand . context . newPage ();
await page . goto ( "https://example.com" );
// Get all text content
const { pageText } = await stagehand . extract ();
console . log ( pageText );
await stagehand . close ();
Answer Questions
await page . goto ( "https://news.ycombinator.com" );
// Extract specific information
const { extraction } = await stagehand . extract (
"What is the title of the top story?"
);
console . log ( extraction ); // "New AI Framework Released"
import { z } from "zod" ;
await page . goto ( "https://example-shop.com/product/123" );
// Define schema
const productSchema = z . object ({
name: z . string (),
price: z . string (),
description: z . string (),
inStock: z . boolean (),
rating: z . number (). optional (),
});
// Extract data
const product = await stagehand . extract (
"Extract the product details" ,
productSchema
);
console . log ( product );
// {
// name: "Wireless Mouse",
// price: "$29.99",
// description: "Ergonomic wireless mouse with...",
// inStock: true,
// rating: 4.5
// }
import { z } from "zod" ;
await page . goto ( "https://example.com/articles" );
const articleListSchema = z . object ({
articles: z . array (
z . object ({
title: z . string (),
author: z . string (),
date: z . string (),
summary: z . string (). optional (),
})
),
});
const { articles } = await stagehand . extract (
"Extract all articles from the page" ,
articleListSchema
);
console . log ( `Found ${ articles . length } articles` );
// Extract only from specific section
const headerData = await stagehand . extract (
"Get the navigation links" ,
z . object ({
links: z . array ( z . object ({ text: z . string (), url: z . string () })),
}),
{ selector: "header nav" }
);
Complex Schema with Descriptions
import { z } from "zod" ;
const jobSchema = z . object ({
title: z . string (). describe ( "Job title" ),
company: z . string (). describe ( "Company name" ),
location: z . string (). describe ( "Job location" ),
salary: z
. string ()
. optional ()
. describe ( "Salary range if available" ),
remote: z . boolean (). describe ( "Whether the job is remote" ),
requirements: z
. array ( z . string ())
. describe ( "List of job requirements" ),
});
await page . goto ( "https://jobs.example.com/posting/123" );
const job = await stagehand . extract (
"Extract the job posting details" ,
jobSchema
);
// Use different model for extraction
const data = await stagehand . extract (
"Extract contact information" ,
contactSchema ,
{
model: "anthropic/claude-3-5-sonnet-latest" ,
}
);
const page1 = await stagehand . context . newPage ();
const page2 = await stagehand . context . newPage ();
await page1 . goto ( "https://example.com/page1" );
await page2 . goto ( "https://example.com/page2" );
// Extract from specific pages
const data1 = await stagehand . extract ( "Get the title" , schema , { page: page1 });
const data2 = await stagehand . extract ( "Get the title" , schema , { page: page2 });
Real-World Examples
E-commerce Product
const productSchema = z . object ({
product: z . object ({
name: z . string (),
brand: z . string (),
price: z . object ({
current: z . string (),
original: z . string (). optional (),
currency: z . string (),
}),
availability: z . enum ([ "in_stock" , "out_of_stock" , "pre_order" ]),
images: z . array ( z . string (). url ()),
specifications: z . record ( z . string (), z . string ()),
reviews: z . object ({
averageRating: z . number (),
totalReviews: z . number (),
}). optional (),
}),
});
const data = await stagehand . extract (
"Extract complete product information" ,
productSchema
);
News Articles
const newsSchema = z . object ({
article: z . object ({
headline: z . string (),
subheading: z . string (). optional (),
author: z . string (),
publishDate: z . string (),
content: z . string (),
tags: z . array ( z . string ()),
relatedArticles: z . array (
z . object ({
title: z . string (),
url: z . string (),
})
). optional (),
}),
});
const article = await stagehand . extract (
"Extract the article content and metadata" ,
newsSchema
);
Contact Information
const contactSchema = z . object ({
contact: z . object ({
email: z . string (). email (). optional (),
phone: z . string (). optional (),
address: z . object ({
street: z . string (),
city: z . string (),
state: z . string (),
zip: z . string (),
country: z . string (),
}). optional (),
socialMedia: z . object ({
twitter: z . string (). optional (),
linkedin: z . string (). optional (),
facebook: z . string (). optional (),
}). optional (),
}),
});
const contact = await stagehand . extract (
"Extract all contact information" ,
contactSchema
);
Best Practices
Use descriptive schema fields :
z . string (). describe ( "The product's full name including brand" )
Make optional fields optional :
z . object ({
required: z . string (),
optional: z . string (). optional (),
})
Use enums for known values :
status : z . enum ([ "available" , "unavailable" , "coming_soon" ])
Validate extracted data :
const data = await stagehand . extract ( instruction , schema );
const validated = schema . parse ( data ); // Throws if invalid
Scope to relevant sections :
// More accurate and faster
extract ( instruction , schema , { selector: ".product-details" })
Use appropriate models :
// Use faster models for simple extraction
extract ( "Get title" , schema , { model: "openai/gpt-4.1-mini" })