Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/steerlabs/opensteer/llms.txt

Use this file to discover all available pages before exploring further.

Signature

extract<TData = unknown>(options: ExtractOptions<TSchema>): Promise<TData>
Extract structured data from the current page using a schema with element references, CSS selectors, or AI-driven extraction. The method automatically resolves field targets from the schema and extracts values from the DOM.

Parameters

options
ExtractOptions<TSchema>
required
Extraction configuration object

Returns

data
TData
The extracted data matching the schema structure. Nested objects and arrays are fully resolved.

Examples

Basic field extraction with element counters

import { Opensteer } from 'opensteer'

const opensteer = new Opensteer()
await opensteer.launch()
await opensteer.goto('https://example.com/product')

const html = await opensteer.snapshot({ mode: 'extraction' })
// Review HTML to find element counters (c="3", c="5", etc.)

const product = await opensteer.extract({
  schema: {
    title: { element: 3 },
    price: { element: 5, attribute: 'data-price' },
    url: { source: 'current_url' },
  },
})

console.log(product)
// { title: 'Premium Widget', price: '29.99', url: 'https://example.com/product' }

Extraction with CSS selectors

const article = await opensteer.extract({
  schema: {
    headline: { selector: 'h1.article-title' },
    author: { selector: '.author-name' },
    publishDate: { selector: 'time', attribute: 'datetime' },
    body: { selector: '.article-content' },
  },
})

Nested object extraction

const listing = await opensteer.extract({
  description: 'property-listing',
  schema: {
    address: {
      street: { element: 10 },
      city: { element: 11 },
      zip: { element: 12 },
    },
    pricing: {
      current: { element: 20 },
      original: { element: 21 },
    },
    features: [
      { name: { element: 30 }, value: { element: 31 } },
    ],
  },
})

Persisted extraction with description

// First run: resolves paths and persists to .opensteer/selectors/
const data = await opensteer.extract({
  description: 'product-info',
  schema: {
    title: { selector: 'h1.product-title' },
    price: { selector: '.price' },
    stock: { selector: '.stock-status' },
  },
})

// Subsequent runs: loads cached paths from disk
// Works even if element counters or DOM structure changes slightly
const updatedData = await opensteer.extract({
  description: 'product-info',
  schema: {
    title: { selector: 'h1.product-title' },
    price: { selector: '.price' },
    stock: { selector: '.stock-status' },
  },
})

AI-driven extraction without explicit schema

// AI extracts data based on page content and prompt
const summary = await opensteer.extract({
  prompt: 'Extract the main product details including name, price, and availability',
})

console.log(summary)
// AI returns structured data matching the prompt

Extracting arrays of items

const results = await opensteer.extract({
  description: 'search-results',
  schema: {
    items: [
      {
        title: { selector: 'h2.result-title' },
        link: { selector: 'a.result-link', attribute: 'href' },
        snippet: { selector: '.result-snippet' },
      },
    ],
  },
})

console.log(results.items)
// Array of objects with title, link, and snippet

Schema Structure

The ExtractSchema supports multiple field types:

Element counter field

{ element: 3, attribute?: 'href' }

CSS selector field

{ selector: '.price', attribute?: 'data-value' }

Special source field

{ source: 'current_url' }

Nested object

{
  user: {
    name: { element: 5 },
    email: { element: 6 },
  },
}

Array of objects

{
  results: [
    {
      title: { selector: '.title' },
      link: { selector: 'a', attribute: 'href' },
    },
  ],
}

Literal values

{
  type: 'product',
  version: 2,
  available: true,
}

Resolution Chain

The extraction method follows this resolution order:
  1. Persisted paths - If description is provided and matching paths exist in .opensteer/selectors/, those are used
  2. Schema hints - Element counters, selectors, and sources in the schema are resolved directly
  3. AI planning - If no deterministic targets are found, the AI analyzes the page and generates an extraction plan
  4. Field extraction - Resolved targets are used to extract values from the DOM

Caching and Persistence

When description is provided:
  • Element paths are persisted to .opensteer/selectors/{namespace}/{description}.json
  • Schema hash is stored to detect changes
  • Subsequent runs with matching description and schema load cached paths
  • Delete the cached file to force re-extraction

Type Safety

interface Product {
  title: string
  price: string
  url: string
}

const product = await opensteer.extract<Product>({
  schema: {
    title: { element: 3 },
    price: { element: 5 },
    url: { source: 'current_url' },
  },
})

// product is typed as Product
console.log(product.title)

Error Handling

try {
  const data = await opensteer.extract({
    description: 'product-data',
    schema: {
      title: { selector: '.missing-element' },
    },
  })
} catch (error) {
  // Extraction may fail if:
  // - Required elements are not found
  // - Selectors are invalid
  // - AI extraction planning fails
  // - Cached selector is incompatible with current schema
  console.error('Extraction failed:', error.message)
}
See Schema Types for detailed schema field options and ExtractionPlan for two-phase extraction.

See Also

Build docs developers (and LLMs) love