extract() Method

Overview

The extract() method extracts structured data from the current page using AI. It can return page text, answer questions, or extract data into custom schemas.

Syntax

// Get page text
const { pageText } = await stagehand.extract();

// Answer a question
const { extraction } = await stagehand.extract("What is the article about?");

// Extract with custom schema
const data = await stagehand.extract(instruction, schema, options?);

Overloads

1. Extract Page Text

await stagehand.extract();
await stagehand.extract(options?);

returns

Promise<{ pageText: string }>

Object containing the full page text

2. Extract with Instruction (Default Schema)

await stagehand.extract(instruction, options?);

instruction

string

required

Question or description of what to extract

returns

Promise<{ extraction: string }>

Object containing the extracted string

3. Extract with Custom Schema

await stagehand.extract(instruction, schema, options?);

instruction

string

required

Description of what to extract

schema

StagehandZodSchema

required

Zod schema defining the structure of extracted dataExample:

import { z } from "zod";

const schema = z.object({
  title: z.string(),
  price: z.string(),
  inStock: z.boolean(),
});

returns

Promise<T>

Extracted data matching the schema type

Options

options

ExtractOptions

Show properties

model

ModelConfiguration

Override the default modelFormat: "provider/model" or { modelName, ...clientOptions }

timeout

number

Maximum time to wait (milliseconds)

selector

string

CSS selector to scope extraction to a specific element

page

Page | PlaywrightPage | PuppeteerPage | PatchrightPage

Page to extract from (defaults to active page)

Examples

Extract Page Text

import { Stagehand } from "@browserbasehq/stagehand";

const stagehand = new Stagehand({ env: "LOCAL" });
await stagehand.init();

const page = await stagehand.context.newPage();
await page.goto("https://example.com");

// Get all text content
const { pageText } = await stagehand.extract();
console.log(pageText);

await stagehand.close();

Answer Questions

await page.goto("https://news.ycombinator.com");

// Extract specific information
const { extraction } = await stagehand.extract(
  "What is the title of the top story?"
);

console.log(extraction); // "New AI Framework Released"

Extract Structured Data

import { z } from "zod";

await page.goto("https://example-shop.com/product/123");

// Define schema
const productSchema = z.object({
  name: z.string(),
  price: z.string(),
  description: z.string(),
  inStock: z.boolean(),
  rating: z.number().optional(),
});

// Extract data
const product = await stagehand.extract(
  "Extract the product details",
  productSchema
);

console.log(product);
// {
//   name: "Wireless Mouse",
//   price: "$29.99",
//   description: "Ergonomic wireless mouse with...",
//   inStock: true,
//   rating: 4.5
// }

Extract Lists

import { z } from "zod";

await page.goto("https://example.com/articles");

const articleListSchema = z.object({
  articles: z.array(
    z.object({
      title: z.string(),
      author: z.string(),
      date: z.string(),
      summary: z.string().optional(),
    })
  ),
});

const { articles } = await stagehand.extract(
  "Extract all articles from the page",
  articleListSchema
);

console.log(`Found ${articles.length} articles`);

Scoped Extraction

// Extract only from specific section
const headerData = await stagehand.extract(
  "Get the navigation links",
  z.object({
    links: z.array(z.object({ text: z.string(), url: z.string() })),
  }),
  { selector: "header nav" }
);

Complex Schema with Descriptions

import { z } from "zod";

const jobSchema = z.object({
  title: z.string().describe("Job title"),
  company: z.string().describe("Company name"),
  location: z.string().describe("Job location"),
  salary: z
    .string()
    .optional()
    .describe("Salary range if available"),
  remote: z.boolean().describe("Whether the job is remote"),
  requirements: z
    .array(z.string())
    .describe("List of job requirements"),
});

await page.goto("https://jobs.example.com/posting/123");

const job = await stagehand.extract(
  "Extract the job posting details",
  jobSchema
);

Extract with Custom Model

// Use different model for extraction
const data = await stagehand.extract(
  "Extract contact information",
  contactSchema,
  {
    model: "anthropic/claude-3-5-sonnet-latest",
  }
);

Extract from Multiple Pages

const page1 = await stagehand.context.newPage();
const page2 = await stagehand.context.newPage();

await page1.goto("https://example.com/page1");
await page2.goto("https://example.com/page2");

// Extract from specific pages
const data1 = await stagehand.extract("Get the title", schema, { page: page1 });
const data2 = await stagehand.extract("Get the title", schema, { page: page2 });

Real-World Examples

E-commerce Product

const productSchema = z.object({
  product: z.object({
    name: z.string(),
    brand: z.string(),
    price: z.object({
      current: z.string(),
      original: z.string().optional(),
      currency: z.string(),
    }),
    availability: z.enum(["in_stock", "out_of_stock", "pre_order"]),
    images: z.array(z.string().url()),
    specifications: z.record(z.string(), z.string()),
    reviews: z.object({
      averageRating: z.number(),
      totalReviews: z.number(),
    }).optional(),
  }),
});

const data = await stagehand.extract(
  "Extract complete product information",
  productSchema
);

News Articles

const newsSchema = z.object({
  article: z.object({
    headline: z.string(),
    subheading: z.string().optional(),
    author: z.string(),
    publishDate: z.string(),
    content: z.string(),
    tags: z.array(z.string()),
    relatedArticles: z.array(
      z.object({
        title: z.string(),
        url: z.string(),
      })
    ).optional(),
  }),
});

const article = await stagehand.extract(
  "Extract the article content and metadata",
  newsSchema
);

Contact Information

const contactSchema = z.object({
  contact: z.object({
    email: z.string().email().optional(),
    phone: z.string().optional(),
    address: z.object({
      street: z.string(),
      city: z.string(),
      state: z.string(),
      zip: z.string(),
      country: z.string(),
    }).optional(),
    socialMedia: z.object({
      twitter: z.string().optional(),
      linkedin: z.string().optional(),
      facebook: z.string().optional(),
    }).optional(),
  }),
});

const contact = await stagehand.extract(
  "Extract all contact information",
  contactSchema
);

Best Practices

Use descriptive schema fields:

z.string().describe("The product's full name including brand")

Make optional fields optional:

z.object({
  required: z.string(),
  optional: z.string().optional(),
})

Use enums for known values:

status: z.enum(["available", "unavailable", "coming_soon"])

Validate extracted data:

const data = await stagehand.extract(instruction, schema);
const validated = schema.parse(data); // Throws if invalid

Scope to relevant sections:

// More accurate and faster
extract(instruction, schema, { selector: ".product-details" })

Use appropriate models:

// Use faster models for simple extraction
extract("Get title", schema, { model: "openai/gpt-4.1-mini" })

Core Classes

Methods

Types & Schemas

Utilities

Overview

Syntax

Overloads

1. Extract Page Text

2. Extract with Instruction (Default Schema)

3. Extract with Custom Schema

Options

Examples

Extract Page Text

Answer Questions

Extract Structured Data

Extract Lists

Scoped Extraction

Complex Schema with Descriptions

Extract with Custom Model

Extract from Multiple Pages

Real-World Examples

E-commerce Product

News Articles

Contact Information

Best Practices

Build docs developers (and LLMs) love

Core Classes

Methods

Types & Schemas

Utilities

Documentation Index

​Overview

​Syntax

​Overloads

​1. Extract Page Text

​2. Extract with Instruction (Default Schema)

​3. Extract with Custom Schema

​Options

​Examples

​Extract Page Text

​Answer Questions

​Extract Structured Data

​Extract Lists

​Scoped Extraction

​Complex Schema with Descriptions

​Extract with Custom Model

​Extract from Multiple Pages

​Real-World Examples

​E-commerce Product

​News Articles

​Contact Information

​Best Practices

Build docs developers (and LLMs) love

Overview

Syntax

Overloads

1. Extract Page Text

2. Extract with Instruction (Default Schema)

3. Extract with Custom Schema

Options

Examples

Extract Page Text

Answer Questions

Extract Structured Data

Extract Lists

Scoped Extraction

Complex Schema with Descriptions

Extract with Custom Model

Extract from Multiple Pages

Real-World Examples

E-commerce Product

News Articles

Contact Information

Best Practices