Libretto supports four distinct approaches to capturing data and automating web interactions. Each makes different trade-offs between detection risk, setup complexity, data quality, and control. Understanding the trade-offs helps you pick the right tool for your target site.
Approach
Bot detection risk
Best for
Regular Playwright
Moderate
Simple DOM extraction, server-rendered sites
Passive interception (page.on('response'))
Low
SPAs that load data via API calls during navigation
In-browser fetch (pageRequest())
Low to moderate
Deep pagination, bulk queries without UI clicking
Direct HTTP from Node.js
Very high
Public/documented APIs with no bot detection
Recommended hybrid: Combine Regular Playwright for navigation with passive page.on('response') interception for data capture. This gives you browser-based reliability with structured API data quality at minimal detection risk.
Standard Playwright usage — navigate pages, click elements, fill forms, and read DOM content using selectors and page.evaluate().
// Navigate and interactawait page.goto('https://example.com/search');await page.fill('#query', 'search term');await page.click('#submit');await page.waitForSelector('.results');// Extract data from the DOMconst results = await page.evaluate(() => { return Array.from(document.querySelectorAll('.result-item')).map(el => ({ title: el.querySelector('h2')?.textContent, price: el.querySelector('.price')?.textContent, }));});
Pros:
Simplest approach — uses Playwright as intended
No need to understand the site’s API structure
Works with any site regardless of how data is rendered (server-side, client-side, or hybrid)
Data extraction is visual/DOM-based, which maps naturally to what a user sees
Easy to debug with headless: false and Playwright’s trace viewer
Integrates directly with Libretto’s step-based workflow, recovery, and extraction features
Cons:
Slower than API-based approaches — requires full page rendering
Fragile against DOM changes — selectors break when the site updates its markup
Harder to get structured data — you’re scraping rendered HTML rather than clean API responses
Cannot access data that isn’t rendered in the DOM (e.g., API responses with fields the UI doesn’t display)
Bot detection risk: MODERATEPlain Playwright is detectable by browser fingerprinting (Layer 1). Sites with any enterprise bot protection will likely flag it. Sites without active detection won’t notice.
Use playwright-extra with the stealth plugin to patch common fingerprint leaks, or run Playwright with a persistent browser context that looks more like a real browser profile.
Listen to network responses the browser naturally makes as you navigate. You don’t make any extra requests — you just capture the data flowing through.
const capturedData: any[] = [];page.on('response', async (response) => { const url = response.url(); if (url.includes('/api/search/results')) { const json = await response.json(); capturedData.push(json); }});// Trigger the data load by interacting with the UI normallyawait page.goto('https://example.com/search?q=term');await page.waitForSelector('.results');// capturedData now has the raw API response
Pros:
Zero additional bot detection risk from network requests — you’re not making any extra calls. The requests that happen are the ones the site’s own code triggers.
Gets clean, structured API data (JSON) rather than scraped DOM content
API responses often contain more data than the UI displays (hidden fields, IDs, metadata)
Not fragile against DOM changes — the API contract tends to be more stable than CSS selectors
Works with Playwright’s existing page context — no additional setup
Cons:
You only get data that the page naturally loads — you must trigger the right UI flow to cause the requests you need
Still requires Playwright browser automation to drive the page, so you have the browser fingerprinting risk for the navigation itself
Timing can be tricky — you must set up the listener before the navigation that triggers the request
Responses may be paginated or partial — the site’s UI might lazy-load data, requiring you to trigger scrolling or “load more” interactions
If the site uses GraphQL or batched API calls, parsing the right data out of responses requires understanding the API structure
Some responses may be encrypted or obfuscated by bot protection services
Bot detection risk: LOWThe network requests themselves carry zero additional risk since they originate from the site’s own JavaScript. The only risk is from the browser automation layer needed to drive the UI. No extra fetch calls means no anomalous network patterns for API-level monitoring to flag.
Execute fetch calls from within the browser page’s JavaScript context. The requests originate from the browser process itself with all the right credentials and fingerprints. Libretto’s pageRequest() function provides a typed wrapper for this pattern.
Requests come from the real browser — same TLS fingerprint, same cookies, same origin, same HTTP/2 settings. From the server’s perspective, it looks identical to a request the site’s own JS would make.
Full control over which endpoints you call and with what parameters — no need to trigger UI flows
Can call endpoints the UI doesn’t naturally hit (e.g., fetch page 50 of results without clicking “next” 49 times)
Gets clean, structured API data (JSON)
Faster than driving the UI — skip page rendering and go straight to the data
Cons:
Requires understanding the site’s API — you need to know the endpoint URLs, required headers, authentication tokens, and request body format. This requires reverse-engineering the site’s network traffic first.
Vulnerable to fetch/XHR monkey-patching — if the site wraps window.fetch, your calls may be intercepted and flagged because the call stack won’t match the site’s expected code paths
Still requires a Playwright browser to be running (for the execution context)
API endpoints can change without notice
Must handle authentication tokens and CSRF tokens that the site’s own code normally manages
Bot detection risk: LOW to MODERATEThe network-level risk is very low — the requests are genuine browser requests. The risk comes from browser fingerprinting (same as regular Playwright), fetch/XHR monkey-patching detecting unexpected call stacks, and timing/pattern analysis if your requests don’t match normal UI flow patterns.
Most sites do not implement fetch call stack monitoring. This approach is effectively undetectable on the vast majority of sites. Only sites with enterprise-grade bot protection from services like PerimeterX or Shape Security are likely to catch this.
Make HTTP requests directly from Node.js using fetch, axios, got, or similar libraries. No browser involved.
Fastest approach — no browser overhead, no page rendering, minimal memory usage
Simple code — just HTTP requests, no browser lifecycle management
Easy to parallelize — make many concurrent requests without launching multiple browser instances
Lowest resource consumption — suitable for high-volume data collection
Cons:
No cookies unless manually managed — you must extract cookies from a browser session and replicate them, including HttpOnly cookies you can’t access from JS
No browser-specific headers — sec-ch-ua, sec-fetch-*, and other headers that browsers add automatically must be manually fabricated
No JavaScript execution — if the site requires JS to set cookies, generate tokens, or solve challenges, you can’t do it
CSRF and auth tokens must be manually extracted and refreshed
Breaks easily — API changes, new security headers, or updated bot protection will break requests with no fallback
Bot detection risk: VERY HIGH. This approach is detectable at nearly every layer. TLS fingerprinting alone will catch Node.js HTTP clients on any site with even basic bot protection — the TLS fingerprint is fundamentally different from any real browser, and this is one of the strongest detection signals. This approach only works reliably against sites with zero bot detection, or against documented public APIs that expect programmatic access.
For most browser automation workflows, combine Approach 1 and Approach 2: use Regular Playwright to navigate and interact with the site (handling popups, login flows, and anything requiring UI interaction with Libretto’s recovery features), and passively intercept API responses with page.on('response') to capture structured data.This gives you the reliability of browser-based navigation with the data quality of API responses, at minimal detection risk.