Skip to main content

Overview

Libretto supports four distinct approaches to capturing data and automating web interactions. Each makes different trade-offs between detection risk, setup complexity, data quality, and control. Understanding the trade-offs helps you pick the right tool for your target site.
ApproachBot detection riskBest for
Regular PlaywrightModerateSimple DOM extraction, server-rendered sites
Passive interception (page.on('response'))LowSPAs that load data via API calls during navigation
In-browser fetch (pageRequest())Low to moderateDeep pagination, bulk queries without UI clicking
Direct HTTP from Node.jsVery highPublic/documented APIs with no bot detection
Recommended hybrid: Combine Regular Playwright for navigation with passive page.on('response') interception for data capture. This gives you browser-based reliability with structured API data quality at minimal detection risk.

Approach details

Standard Playwright usage — navigate pages, click elements, fill forms, and read DOM content using selectors and page.evaluate().
// Navigate and interact
await page.goto('https://example.com/search');
await page.fill('#query', 'search term');
await page.click('#submit');
await page.waitForSelector('.results');

// Extract data from the DOM
const results = await page.evaluate(() => {
  return Array.from(document.querySelectorAll('.result-item')).map(el => ({
    title: el.querySelector('h2')?.textContent,
    price: el.querySelector('.price')?.textContent,
  }));
});
Pros:
  • Simplest approach — uses Playwright as intended
  • No need to understand the site’s API structure
  • Works with any site regardless of how data is rendered (server-side, client-side, or hybrid)
  • Data extraction is visual/DOM-based, which maps naturally to what a user sees
  • Easy to debug with headless: false and Playwright’s trace viewer
  • Integrates directly with Libretto’s step-based workflow, recovery, and extraction features
Cons:
  • Slower than API-based approaches — requires full page rendering
  • Fragile against DOM changes — selectors break when the site updates its markup
  • Harder to get structured data — you’re scraping rendered HTML rather than clean API responses
  • Cannot access data that isn’t rendered in the DOM (e.g., API responses with fields the UI doesn’t display)
Bot detection risk: MODERATEPlain Playwright is detectable by browser fingerprinting (Layer 1). Sites with any enterprise bot protection will likely flag it. Sites without active detection won’t notice.
Use playwright-extra with the stealth plugin to patch common fingerprint leaks, or run Playwright with a persistent browser context that looks more like a real browser profile.

Comparison matrix

CriteriaRegular PlaywrightPassive interceptionIn-browser fetchDirect HTTP
Bot detection riskModerateLowLow–ModerateVery High
Browser fingerprint riskYesYesYesN/A (wrong fingerprint)
Network fingerprint riskNone (browser requests)None (browser requests)None (browser requests)Very High
API monitoring riskNoneNoneLow (fetch patching)N/A
Data qualityDOM-dependentStructured JSONStructured JSONStructured JSON
Setup complexityLowMediumMedium–HighLow–Medium
API reverse-engineering neededNoPartial (identify endpoints)Yes (full)Yes (full)
Control over data fetchingLowLowHighHigh
SpeedSlowMediumMedium–FastFast
Resource usageHighHighHighLow
Resilience to DOM changesLowHighHighHigh
Resilience to API changesMediumLowLowLow

Decision guide

Use Regular Playwright when:
  • The data you need is visible in the DOM and straightforward to extract with selectors
  • The site doesn’t have aggressive bot protection, or you’re using stealth plugins
  • You want the simplest implementation that integrates with Libretto’s recovery and extraction features
  • The data is rendered server-side and doesn’t come from a separate API call
Use passive interception (page.on('response')) when:
  • The site loads data via API calls during normal navigation (most modern SPAs)
  • You want structured JSON data without reverse-engineering the full API
  • Minimizing detection risk is important
  • You’re already navigating through the UI and want to passively capture data along the way
Use in-browser fetch (pageRequest()) when:
  • You need data from API endpoints that the UI doesn’t naturally trigger (e.g., deep pagination, bulk exports)
  • You’ve verified the site doesn’t monkey-patch fetch (or you can work around it)
  • You want maximum control over which data you fetch and when
  • You’ve already reverse-engineered the relevant API endpoints
Use Direct Node.js HTTP when:
  • The target site has zero bot detection
  • Speed and resource efficiency are the primary concerns
  • You’re hitting a public/documented API (not scraping a website)
  • You need to make thousands of concurrent requests

For most browser automation workflows, combine Approach 1 and Approach 2: use Regular Playwright to navigate and interact with the site (handling popups, login flows, and anything requiring UI interaction with Libretto’s recovery features), and passively intercept API responses with page.on('response') to capture structured data.This gives you the reliability of browser-based navigation with the data quality of API responses, at minimal detection risk.

Bot detection

Understand how sites detect automation and which signals each approach exposes.

Sessions and profiles

Manage named browser sessions and persist authenticated state across runs.

Build docs developers (and LLMs) love