Stealthy Mode (StealthyFetcher)

StealthyFetcher provides advanced anti-bot bypass capabilities using a stealth-patched Chromium browser. It can automatically solve Cloudflare Turnstile challenges and bypass most online bot detection systems.

Basic Usage

One-Off Requests

from scrapling.fetchers import StealthyFetcher

# Simple stealth fetch
page = StealthyFetcher.fetch(
    'https://nopecha.com/demo/cloudflare',
    headless=True
)
data = page.css('#padded_content a').getall()

# With Cloudflare bypass
page = StealthyFetcher.fetch(
    'https://protected-site.com',
    headless=True,
    solve_cloudflare=True,
    network_idle=True
)

With StealthySession

For multiple requests, use StealthySession to keep the browser open:

from scrapling.fetchers import StealthySession

with StealthySession(headless=True, solve_cloudflare=True) as session:
    # First request
    page1 = session.fetch('https://protected-site.com/page1')
    
    # Second request (browser stays open, cookies maintained)
    page2 = session.fetch('https://protected-site.com/page2')
    
    # Third request
    page3 = session.fetch('https://protected-site.com/page3')

Key Features

Cloudflare Bypass

Automatically solve Cloudflare Turnstile and Interstitial challenges:

page = StealthyFetcher.fetch(
    'https://nopecha.com/demo/cloudflare',
    headless=True,
    solve_cloudflare=True,  # Enable automatic solving
    timeout=60000           # 60 seconds timeout
)

Supported Cloudflare challenges:

Non-interactive Turnstile
Interactive Turnstile
Interstitial pages
“Just a moment” waiting pages

Fingerprint Spoofing

Multiple techniques to avoid detection:

page = StealthyFetcher.fetch(
    'https://example.com',
    headless=True,
    hide_canvas=True,       # Add noise to canvas fingerprinting
    block_webrtc=True,      # Prevent WebRTC IP leaks
    allow_webgl=True        # Keep WebGL enabled (recommended)
)

Canvas Fingerprinting: Adds random noise to canvas operations to prevent tracking.WebRTC Blocking: Forces WebRTC to respect proxy settings, preventing local IP leaks.WebGL: Keep enabled by default - many WAFs check for WebGL support.

Google Search Referer

Make requests appear as if they came from Google search:

page = StealthyFetcher.fetch(
    'https://example.com',
    headless=True,
    google_search=True  # Default: True
)

This sets the referer header to look like: https://www.google.com/search?q=example.com

Request Parameters

StealthyFetcher.fetch()

StealthyFetcher.fetch(
    url='https://example.com',
    
    # Browser configuration
    headless=True,                    # Run in headless mode
    real_chrome=False,                # Use installed Chrome instead of Chromium
    
    # Anti-detection
    solve_cloudflare=False,           # Auto-solve Cloudflare challenges
    hide_canvas=False,                # Canvas fingerprint protection
    block_webrtc=False,               # Block WebRTC IP leaks
    allow_webgl=True,                 # Enable WebGL (recommended)
    
    # Headers and referer
    google_search=True,               # Add Google search referer
    extra_headers={'Custom': 'value'},# Additional headers
    useragent='Mozilla/5.0...',       # Custom user agent
    
    # Timing and waits
    timeout=30000,                    # Operation timeout (ms)
    wait=0,                           # Extra wait after load (ms)
    network_idle=False,               # Wait for network idle
    load_dom=True,                    # Wait for DOM load
    
    # Selectors and actions
    wait_selector='#content',         # Wait for selector
    wait_selector_state='attached',   # Selector state: attached/visible/hidden
    page_action=lambda page: page.click('#button'),  # Custom actions
    
    # Resources and performance
    disable_resources=False,          # Block images, fonts, etc.
    blocked_domains={'analytics.com', 'ads.com'},  # Block domains
    
    # Session and state
    cookies=[{'name': 'session', 'value': 'xyz', 'domain': 'example.com'}],
    user_data_dir='/path/to/profile', # Persistent browser profile
    init_script='/path/to/script.js', # JavaScript to run on page creation
    
    # Locale and timezone
    locale='en-US',                   # Browser locale
    timezone_id='America/New_York',   # Browser timezone
    
    # Advanced
    proxy='http://proxy:8080',        # Proxy configuration
    cdp_url='http://localhost:9222',  # Connect to existing browser
    extra_flags=['--flag=value'],     # Additional Chrome flags
)

Advanced Features

Page Actions

Execute custom automation before returning the response:

def custom_action(page):
    # Click a button
    page.click('button#load-more')
    
    # Wait for new content
    page.wait_for_selector('.new-content')
    
    # Scroll to bottom
    page.evaluate('window.scrollTo(0, document.body.scrollHeight)')

page = StealthyFetcher.fetch(
    'https://example.com',
    headless=True,
    page_action=custom_action
)

Wait for Selectors

Wait for specific elements before returning:

page = StealthyFetcher.fetch(
    'https://spa-site.com',
    headless=True,
    wait_selector='.dynamic-content',
    wait_selector_state='visible',  # Options: attached, visible, hidden
    timeout=60000
)

Resource Blocking

Block unnecessary resources for faster loading:

page = StealthyFetcher.fetch(
    'https://example.com',
    headless=True,
    disable_resources=True,  # Blocks: fonts, images, media, stylesheets, etc.
    blocked_domains={'analytics.google.com', 'doubleclick.net'}
)

Blocked resource types:

font, image, media
beacon, object, imageset
texttrack, websocket
csp_report, stylesheet

Persistent Browser Profiles

Maintain browser state across sessions:

with StealthySession(
    headless=True,
    user_data_dir='./browser_profile',  # Persistent profile
    solve_cloudflare=True
) as session:
    # First time: solve Cloudflare and save cookies
    page1 = session.fetch('https://protected-site.com')
    
    # Subsequent requests use saved cookies
    page2 = session.fetch('https://protected-site.com/data')

Real Chrome vs Chromium

Use your installed Chrome browser for maximum compatibility:

page = StealthyFetcher.fetch(
    'https://example.com',
    headless=True,
    real_chrome=True  # Use system Chrome instead of Chromium
)

Using real_chrome=True requires Chrome to be installed on your system.

CDP URL Connection

Connect to an existing browser instance:

# Start Chrome with remote debugging:
# google-chrome --remote-debugging-port=9222

page = StealthyFetcher.fetch(
    'https://example.com',
    cdp_url='http://localhost:9222'
)

Session Management

Basic Session

from scrapling.fetchers import StealthySession

with StealthySession(headless=True) as session:
    page1 = session.fetch('https://example.com/login')
    page2 = session.fetch('https://example.com/dashboard')  # Cookies maintained

Async Session

import asyncio
from scrapling.fetchers import AsyncStealthySession

async def scrape():
    async with AsyncStealthySession(
        headless=True,
        max_pages=3  # Pool of 3 browser tabs
    ) as session:
        # Concurrent requests
        tasks = [
            session.fetch('https://example.com/page1'),
            session.fetch('https://example.com/page2'),
            session.fetch('https://example.com/page3'),
        ]
        results = await asyncio.gather(*tasks)
        
        # Check pool stats
        print(session.get_pool_stats())  # {busy: 0, free: 3, error: 0}

asyncio.run(scrape())

Adaptive Mode

Enable adaptive element finding for website changes:

from scrapling.fetchers import StealthyFetcher

# Enable adaptive mode globally
StealthyFetcher.adaptive = True

page = StealthyFetcher.fetch(
    'https://example.com',
    headless=True,
    network_idle=True
)

# First time: save element signatures
products = page.css('.product', auto_save=True)

# Later, if website structure changes:
products = page.css('.product', adaptive=True)  # Finds elements even if CSS changed

Error Handling

from scrapling.fetchers import StealthyFetcher
from playwright.sync_api import TimeoutError, Error

try:
    page = StealthyFetcher.fetch(
        'https://example.com',
        headless=True,
        timeout=30000,
        solve_cloudflare=True
    )
except TimeoutError:
    print("Request timed out")
except Error as e:
    print(f"Browser error: {e}")

Best Practices

Use sessions for multiple requests

Always use StealthySession when making multiple requests. This keeps the browser open and maintains cookies, significantly improving performance.

Set appropriate timeouts

Cloudflare solving can take 10-30 seconds. Set timeout=60000 (60s) when using solve_cloudflare=True.

Block unnecessary resources

Enable disable_resources=True to block images, fonts, and stylesheets for faster page loads.

Keep WebGL enabled

Many anti-bot systems check for WebGL support. Keep allow_webgl=True (default) for better stealth.

Use Google referer

Keep google_search=True (default) to make requests appear more legitimate.

Run headless in production

Use headless=True in production. Set to False only for debugging.

Comparison with Other Fetchers

Feature	Fetcher	StealthyFetcher	DynamicFetcher
Speed	⚡⚡⚡	⚡⚡	⚡
Cloudflare Bypass	❌	✅	❌
Canvas Protection	❌	✅	❌
WebRTC Blocking	❌	✅	❌
JavaScript Execution	❌	✅	✅
Resource Usage	Low	Medium	High

Next Steps

Browser Automation

Learn about DynamicFetcher for general automation

Sessions

Master session management

Proxy Rotation

Rotate proxies for stealth sessions

Getting Started

Core Concepts

Fetching

Parsing & Selection

Spiders

CLI & Tools

AI Integration

Guides

Tutorials

Stealthy Mode (StealthyFetcher)

Basic Usage

One-Off Requests

With StealthySession

Key Features

Cloudflare Bypass

Fingerprint Spoofing

Google Search Referer

Request Parameters

StealthyFetcher.fetch()

Advanced Features

Page Actions

Wait for Selectors

Resource Blocking

Persistent Browser Profiles

Real Chrome vs Chromium

CDP URL Connection

Session Management

Basic Session

Async Session

Adaptive Mode

Error Handling

Best Practices

Comparison with Other Fetchers

Next Steps

Browser Automation

Sessions

Proxy Rotation

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Fetching

Parsing & Selection

Spiders

CLI & Tools

AI Integration

Guides

Tutorials

Documentation Index

​Basic Usage

​One-Off Requests

​With StealthySession

​Key Features

​Cloudflare Bypass

​Fingerprint Spoofing

​Google Search Referer

​Request Parameters

​StealthyFetcher.fetch()

​Advanced Features

​Page Actions

​Wait for Selectors

​Resource Blocking

​Persistent Browser Profiles

​Real Chrome vs Chromium

​CDP URL Connection

​Session Management

​Basic Session

​Async Session

​Adaptive Mode

​Error Handling

​Best Practices

​Comparison with Other Fetchers

​Next Steps

Browser Automation

Sessions

Proxy Rotation

Build docs developers (and LLMs) love

Basic Usage

One-Off Requests

With StealthySession

Key Features

Cloudflare Bypass

Fingerprint Spoofing

Google Search Referer

Request Parameters

StealthyFetcher.fetch()

Advanced Features

Page Actions

Wait for Selectors

Resource Blocking

Persistent Browser Profiles

Real Chrome vs Chromium

CDP URL Connection

Session Management

Basic Session

Async Session

Adaptive Mode

Error Handling

Best Practices

Comparison with Other Fetchers

Next Steps