Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/steerlabs/opensteer/llms.txt

Use this file to discover all available pages before exploring further.

ExtractSchema

The ExtractSchema defines the structure of data to extract from a page. Schemas support nested objects, arrays, and multiple field types for flexible data extraction.
interface ExtractSchema {
  [key: string]: ExtractSchemaValue
}

type ExtractSchemaValue =
  | ExtractSchemaField
  | string
  | number
  | boolean
  | null
  | ExtractSchema
  | ExtractSchema[]

Schema Field Types

ExtractSchemaField

ExtractSchemaField
object
A field descriptor that references a DOM element or special source.

Element Counter Field

References an element by its counter from a snapshot:
{
  title: { element: 3 }
}
Extracts the textContent of the element with c="3".
{
  link: { element: 5, attribute: 'href' }
}
Extracts the href attribute of element c="5".

CSS Selector Field

References an element by CSS selector:
{
  price: { selector: '.product-price' }
}
Extracts text from the first matching element.
{
  image: { selector: 'img.hero', attribute: 'src' }
}
Extracts the src attribute.

Current URL Field

Extracts the current page URL:
{
  url: { source: 'current_url' }
}
No DOM lookup required - returns page.url().

Nested Objects

Schemas support arbitrary nesting:
{
  product: {
    name: { element: 10 },
    pricing: {
      current: { element: 20 },
      original: { element: 21 },
      discount: { element: 22 }
    },
    metadata: {
      url: { source: 'current_url' },
      sku: { selector: '[data-sku]', attribute: 'data-sku' }
    }
  }
}
Result structure mirrors the schema:
{
  product: {
    name: "Widget",
    pricing: {
      current: "19.99",
      original: "29.99",
      discount: "33% off"
    },
    metadata: {
      url: "https://example.com/product",
      sku: "WDG-001"
    }
  }
}

Arrays

Arrays of objects are supported for extracting repeated structures:
{
  results: [
    {
      title: { selector: '.result-title' },
      link: { selector: '.result-link', attribute: 'href' },
      snippet: { selector: '.result-snippet' }
    }
  ]
}
Opensteer automatically:
  1. Finds all matching parent elements
  2. Extracts each field relative to each parent
  3. Returns an array of objects
{
  results: [
    { title: "First result", link: "/first", snippet: "..." },
    { title: "Second result", link: "/second", snippet: "..." },
    { title: "Third result", link: "/third", snippet: "..." }
  ]
}

Literal Values

Schemas can include literal values:
{
  type: 'product',
  version: 2,
  extracted: true,
  timestamp: null,
  data: {
    title: { element: 3 }
  }
}
Literals are included as-is in the result:
{
  type: "product",
  version: 2,
  extracted: true,
  timestamp: null,
  data: { title: "Widget" }
}

Complete Example

import { Opensteer } from 'opensteer'

const opensteer = new Opensteer()
await opensteer.launch()
await opensteer.goto('https://store.example.com/product/123')

interface ProductData {
  meta: {
    url: string
    type: string
  }
  product: {
    name: string
    price: string
    images: Array<{ src: string; alt: string }>
  }
  reviews: Array<{
    author: string
    rating: string
    text: string
  }>
}

const data = await opensteer.extract<ProductData>({
  description: 'product-page',
  schema: {
    meta: {
      url: { source: 'current_url' },
      type: 'product',
    },
    product: {
      name: { selector: 'h1.product-name' },
      price: { selector: '.price-current' },
      images: [
        {
          src: { selector: 'img.product-image', attribute: 'src' },
          alt: { selector: 'img.product-image', attribute: 'alt' },
        },
      ],
    },
    reviews: [
      {
        author: { selector: '.review-author' },
        rating: { selector: '.review-rating', attribute: 'data-rating' },
        text: { selector: '.review-text' },
      },
    ],
  },
})

console.log(data.product.name)
console.log(data.reviews.length, 'reviews')

ExtractionPlan

An ExtractionPlan is an intermediate representation returned by AI extraction or used for two-phase extraction with extractFromPlan().
interface ExtractionPlan {
  fields?: Record<string, ExtractionFieldPlan>
  paths?: Record<string, ElementPath>
  data?: unknown
}
ExtractionPlan
object

ExtractionFieldPlan

interface ExtractionFieldPlan {
  element?: number
  selector?: string
  attribute?: string
  source?: 'current_url'
}
Similar to ExtractSchemaField, but used in plans generated by AI or built programmatically.

extractFromPlan()

Extract data using a pre-built extraction plan with explicit field mappings and element paths.

Signature

extractFromPlan<TData>(options: ExtractFromPlanOptions<TSchema>): Promise<ExtractionRunResult<TData>>

Parameters

options
ExtractFromPlanOptions
required

Returns

ExtractionRunResult<TData>
object

Example: Two-Phase Extraction

import { Opensteer } from 'opensteer'

const opensteer = new Opensteer()
await opensteer.launch()
await opensteer.goto('https://example.com/data')

// Phase 1: AI generates extraction plan
const html = await opensteer.snapshot({ mode: 'extraction' })
const plan = await analyzePageWithAI(html) // Returns ExtractionPlan

// Phase 2: Execute plan with extractFromPlan
const result = await opensteer.extractFromPlan({
  description: 'ai-generated-plan',
  schema: {
    title: { element: 0 }, // Placeholder schema
    content: { element: 0 },
  },
  plan: {
    fields: {
      title: { element: 5 },
      content: { element: 10 },
    },
  },
})

console.log(result.data) // { title: "...", content: "..." }
console.log(result.persisted) // true if description was provided
console.log(result.paths) // ElementPath objects for each field

Example: Using Pre-Resolved Paths

import { ElementPath } from 'opensteer'

const titlePath: ElementPath = {
  context: [],
  nodes: [
    { tag: 'h1', match: [{ kind: 'class', value: 'page-title' }] },
  ],
}

const result = await opensteer.extractFromPlan({
  schema: {
    title: { selector: '.page-title' },
  },
  plan: {
    paths: {
      title: titlePath,
    },
  },
})

console.log(result.data.title)

Type Definitions

Complete TypeScript types:
import type { ElementPath } from 'opensteer'

export interface ExtractSchemaField {
  element?: number
  selector?: string
  attribute?: string
  source?: 'current_url'
}

export type ExtractSchemaValue =
  | ExtractSchemaField
  | string
  | number
  | boolean
  | null
  | ExtractSchema
  | ExtractSchema[]

export interface ExtractSchema {
  [key: string]: ExtractSchemaValue
}

export interface ExtractionFieldPlan {
  element?: number
  selector?: string
  attribute?: string
  source?: 'current_url'
}

export interface ExtractionPlan {
  fields?: Record<string, ExtractionFieldPlan>
  paths?: Record<string, ElementPath>
  data?: unknown
}

export interface ExtractOptions<TSchema = ExtractSchema> {
  schema?: TSchema
  description?: string
  prompt?: string
  snapshot?: SnapshotOptions
  element?: number
  selector?: string
  wait?: false | ActionWaitOptions
}

export interface ExtractFromPlanOptions<TSchema = ExtractSchema> {
  description?: string
  schema: TSchema
  plan: ExtractionPlan
}

export interface ExtractionRunResult<T = unknown> {
  namespace: string
  persisted: boolean
  pathFile: string | null
  data: T
  paths: Record<string, ElementPath>
}

Best Practices

Use element counters for dynamic content

// Generate snapshot first
const html = await opensteer.snapshot({ mode: 'extraction' })
// Inspect HTML to find counters
// Then extract using counters
const data = await opensteer.extract({
  schema: { title: { element: 3 } },
})

Use selectors for stable structures

// Semantic selectors work across page changes
const data = await opensteer.extract({
  description: 'article-data',
  schema: {
    headline: { selector: 'article h1' },
    author: { selector: '.author-name' },
    date: { selector: 'time', attribute: 'datetime' },
  },
})

Cache with descriptions

// Persisted paths survive element counter changes
const data = await opensteer.extract({
  description: 'product-listing', // Enables caching
  schema: {
    name: { selector: '.product-name' },
    price: { selector: '.price' },
  },
})

Type your results

interface Article {
  title: string
  author: string
  publishDate: string
  body: string
}

const article = await opensteer.extract<Article>({
  schema: {
    title: { selector: 'h1' },
    author: { selector: '.author' },
    publishDate: { selector: 'time', attribute: 'datetime' },
    body: { selector: '.article-content' },
  },
})

// Fully typed!
article.title.toUpperCase()

See Also

Build docs developers (and LLMs) love