Bot detection

Overview

Bot detection operates at multiple independent layers. A site doesn’t need to implement all of them — even one active layer can block your automation. Knowing which layers are present on your target site is the first step to choosing an approach that won’t get flagged. The five layers range from passive browser fingerprinting (which fires the moment your browser connects) to enterprise bot protection services that aggregate all signals and apply machine learning. Each layer has different mitigations.

Layer 1: Browser fingerprinting

When a browser connects to a site, the site can inspect dozens of signals to determine if the browser is real or automated:

navigator.webdriver: Set to true in automated browsers. Detection scripts check this immediately. Playwright sets this by default.
Browser plugin and extension footprint: Real browsers have plugins like PDF viewers, font lists, and media codecs. Automated browsers often have none. In headless mode, navigator.plugins.length === 0 is a strong signal.
WebGL and Canvas fingerprinting: The site renders invisible graphics and hashes the output. Headless browsers produce distinct rendering artifacts.
Screen and window dimensions: Headless browsers often report unusual viewport sizes or have window.outerWidth === 0.
User-Agent consistency: The User-Agent string must match actual browser behavior. Claiming to be Chrome 120 but having Firefox-like JS engine behavior is a red flag.
CDP detection: Some sites detect whether a Chrome DevTools Protocol (CDP) session is attached, which is how Playwright controls the browser.
Headless-specific object detection: Automated browsers are missing objects and properties that exist in real headed Chrome. Detection scripts check for a missing chrome.runtime, absent Notification.permission prompts, navigator.permissions.query() behaving differently, and window.chrome being undefined or incomplete. navigator.languages may also be empty or contain only "en" in headless mode.
Iframe and sandbox detection: Some sites check if their code is running inside an iframe or sandboxed context by comparing window.self !== window.top, inspecting window.frameElement, or checking whether document.hasFocus() returns false (common in headless or background contexts) and whether document.visibilityState is "visible".

Affected approaches: All Playwright-based approaches (Regular Playwright, passive interception, in-browser fetch). Direct HTTP has a different problem — a completely wrong fingerprint.

Layer 2: Behavioral analysis

Beyond the browser itself, detection systems analyze how the user behaves:

Mouse movement patterns: Real users have natural mouse trajectories with acceleration curves. Automated clicks happen without preceding mouse movement.
Typing cadence: Real typing has variable delays between keystrokes. page.fill() inserts text instantly. page.type() with default settings uses uniform delays.
Scroll behavior: Real users scroll with momentum and variable speed. Programmatic scrolling is instant or perfectly uniform.
Navigation timing: Real users take time to read content before clicking. Bots navigate instantly between actions.
Interaction sequence: Clicking a submit button without first clicking or focusing the input fields is suspicious.

Mitigation: Add realistic delays between actions. Use page.type() with random inter-key delays instead of page.fill() for sensitive fields. Add scroll interactions before clicking. Never navigate faster than a human could read.

Layer 3: Network-level detection

The network request itself carries signals independent of what the browser reports:

TLS fingerprint (JA3/JA4): Every HTTP client has a unique TLS handshake fingerprint based on the cipher suites, extensions, and elliptic curves it offers. Node.js fetch/axios have a completely different TLS fingerprint than Chrome. This is one of the strongest detection signals and is very hard to fake from outside a browser.
HTTP/2 fingerprint: The SETTINGS frame, WINDOW_UPDATE behavior, and header ordering in HTTP/2 differ between browsers and HTTP libraries.
Header ordering and values: Browsers send headers in a specific order — Chrome always sends sec-ch-ua headers, for example. Node.js HTTP clients send headers in a different order or omit browser-specific headers entirely.
Cookie state: Requests from a real browser session carry the full cookie jar. External HTTP requests must manually replicate cookies and may miss HttpOnly cookies or cookies set by JavaScript.
Referer and Origin: Browser requests automatically include the correct Referer and Origin headers based on navigation state. External requests must fabricate these.

Affected approaches: Direct HTTP is maximally exposed here. Playwright-based requests (including in-browser fetch) use the real browser’s TLS stack and header ordering, so they pass network-level checks.

Layer 4: API-level monitoring

Some sophisticated sites monitor the behavior of their own frontend code at runtime:

Fetch/XHR monkey-patching: The site overrides window.fetch and/or XMLHttpRequest.prototype.open with wrapper functions that log every request, including its call stack. If a fetch() call originates from code that isn’t part of the site’s own bundle, it can be flagged.

// What the site does (runs very early, before your code):
const _fetch = window.fetch;
window.fetch = function(...args) {
  const stack = new Error().stack;
  if (!isExpectedCallSite(stack)) {
    reportAnomaly({ url: args[0], stack });
  }
  return _fetch.apply(this, args);
};

Proxy-based interception: Instead of replacing fetch, some sites use Proxy objects to wrap it. This is harder to detect because fetch.toString() still returns "function fetch() { [native code] }".
Timing correlation: The site knows which API calls its own code makes and when. If an endpoint is called at a time when the UI flow wouldn’t trigger it, that’s anomalous.
Request frequency and patterns: The site’s own code calls APIs in predictable patterns — pagination calls come in sequence, search calls follow debounce timings. Automation that deviates from these patterns can be flagged.

Affected approaches: In-browser fetch (pageRequest()) is the primary target here. Passive interception (page.on('response')) is immune — it makes no additional fetch calls at all.

Most sites do not implement Layer 4 monitoring. It is primarily found on sites with enterprise-grade bot protection from services like PerimeterX or Shape Security. Check for fetch patching before committing to an in-browser fetch approach.

Layer 5: Enterprise bot protection services

Many sites don’t build their own detection — they use third-party services that combine all the layers above into a continuously updated product:

Service	Common indicators
Akamai Bot Manager	Scripts from `*.akamaized.net`, `_abck` cookie, `sensor_data` payloads
PerimeterX (HUMAN)	Scripts loading from `.perimeterx.net` or `.px-cdn.net`, `_px` cookies
DataDome	Scripts from `*.datadome.co`, `datadome` cookie, interstitial challenge pages
Cloudflare Bot Management	`cf_clearance` cookie, challenge pages with “Checking your browser” message
Shape Security (F5)	Obfuscated inline scripts that collect telemetry, `_imp_apg_r_` style cookies
Kasada	Scripts from `.kasada.io`, `x-kpsdk-` headers

These services push updates frequently. An automation that works today may break next week with no changes on your end.

Identifying bot detection on your target site

Before building your automation, audit the target site to understand what you’re up against.

Check for enterprise bot protection

Open the site in a normal browser with DevTools open on the Network tab:

Filter by JS in the Network tab. Look for domains associated with known bot protection services (listed in the table above).
In DevTools Application > Cookies, look for telltale cookies like _abck, _px, datadome, cf_clearance, etc.
Navigate around the site. If you see a “Checking your browser…” interstitial, the site uses active bot protection.
View source and look at the first <script> tags. Enterprise bot protection scripts are typically injected before any application code.

Check if fetch/XHR is patched

Open the browser console and run:

// Check if fetch has been wrapped
window.fetch.toString()
// Native (safe):     "function fetch() { [native code] }"
// Patched (flagged): will show actual JavaScript source

// Check XMLHttpRequest
XMLHttpRequest.prototype.open.toString()
// Native: "function open() { [native code] }"

// Check for property descriptor tampering
Object.getOwnPropertyDescriptor(window, 'fetch')
// Native: { value: ƒ, writable: true, enumerable: true, configurable: true }
// Tampered: may have getters/setters or different configurability

If the site uses Proxy to wrap fetch, the toString() check will still return "[native code]". To detect Proxy-based wrapping:

try {
  const desc = Object.getOwnPropertyDescriptor(window, 'fetch');
  console.log('configurable:', desc.configurable);
  console.log('writable:', desc.writable);
  console.log(window.fetch instanceof Function); // should be true
  console.log(window.fetch.prototype); // native fetch has no prototype
} catch (e) {
  console.log('fetch access is trapped');
}

Check for behavioral monitoring

Look for signs that the site collects behavioral telemetry:

// Check if common event listeners are heavily registered
getEventListeners(document)
// In Chrome DevTools, this shows all listeners. An unusually large number
// of mousemove, keydown, scroll, and touchstart listeners suggests telemetry.

// Check for known telemetry globals
// PerimeterX:
typeof window._pxAppId !== 'undefined'
// Akamai:
typeof window.bmak !== 'undefined'
// DataDome:
typeof window.ddjskey !== 'undefined'

Test with plain Playwright

The simplest test: run a basic Playwright script against the site and observe what happens.

import { chromium } from 'playwright';
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://target-site.com');
// If you get a challenge page, CAPTCHA, or block — bot detection is active.

If plain Playwright gets blocked, the site has browser-level detection. If it works, the site likely has only basic or no detection.

Infrastructure considerations

Even with a well-fingerprinted browser, infrastructure-level signals can expose automation: IP reputation and rate limiting

Cloud provider IP ranges (AWS, GCP, Azure) are well-known and flagged by most bot protection services. Requests from these ranges face higher scrutiny or outright blocking regardless of browser fingerprint quality.
Even without bot detection, sites enforce per-IP request limits. Hitting the same site too frequently from one IP triggers throttling or temporary bans.
If your IP geolocates to one region but your browser reports a timezone and locale from another, that inconsistency is a signal.
Residential proxy services provide IP addresses from real ISPs, making requests appear to originate from normal households. Rotating proxies distribute requests across many IPs to avoid rate limits.

CAPTCHA and challenge handling

reCAPTCHA v2: The checkbox or image-selection challenge. Can sometimes be bypassed in automated browsers if the risk score is low enough (it evaluates browser fingerprint and behavior first).
reCAPTCHA v3: Invisible — returns a score from 0.0 to 1.0 with no user interaction. A well-fingerprinted browser with natural behavior scores higher.
hCaptcha: Similar to reCAPTCHA v2. Cloudflare uses it as a fallback.
Cloudflare Turnstile: Non-interactive challenge that evaluates browser signals. Replaces traditional CAPTCHAs on many Cloudflare-protected sites.

If a CAPTCHA is triggered during automation, it usually means the browser fingerprint or behavior failed earlier checks. Fixing the root cause — better stealth, slower interaction patterns — is more effective than trying to solve CAPTCHAs programmatically.

Block manifestations

Soft blocks: The site returns degraded results (fewer items, missing data, slower responses) without an explicit error. These are hard to detect — you may not realize you’re getting incomplete data.
Hard blocks: HTTP 403, CAPTCHA pages, “Access Denied” responses, or redirects to a challenge page.
Cookie consent and GDPR banners: Not bot detection, but a common obstacle. These overlays block interactions with the underlying page and must be detected and dismissed before proceeding.

Anti-detection maintenance Bot detection is adversarial — both sides are continuously updating. Enterprise bot protection services push updates frequently. Browser updates change fingerprints. Stealth patches need to keep pace with detection updates. Budget time for ongoing maintenance of any automation targeting a site with active bot protection.

Automation approaches

Compare the four integration strategies and their detection risk profiles.

Sessions and profiles

Manage named browser sessions and persist authenticated state across runs.

Use Cases

Concepts

Overview

Identifying bot detection on your target site

Infrastructure considerations

Automation approaches

Sessions and profiles

Build docs developers (and LLMs) love

Use Cases

Concepts

​Overview

​Identifying bot detection on your target site

​Infrastructure considerations

​Related pages

Automation approaches

Sessions and profiles

Build docs developers (and LLMs) love

Overview

Identifying bot detection on your target site

Infrastructure considerations

Related pages