Skip to main content
StealthyFetcher provides advanced anti-bot bypass capabilities using a stealth-patched Chromium browser. It can automatically solve Cloudflare Turnstile challenges and bypass most online bot detection systems.

Basic Usage

One-Off Requests

from scrapling.fetchers import StealthyFetcher

# Simple stealth fetch
page = StealthyFetcher.fetch(
    'https://nopecha.com/demo/cloudflare',
    headless=True
)
data = page.css('#padded_content a').getall()

# With Cloudflare bypass
page = StealthyFetcher.fetch(
    'https://protected-site.com',
    headless=True,
    solve_cloudflare=True,
    network_idle=True
)

With StealthySession

For multiple requests, use StealthySession to keep the browser open:
from scrapling.fetchers import StealthySession

with StealthySession(headless=True, solve_cloudflare=True) as session:
    # First request
    page1 = session.fetch('https://protected-site.com/page1')
    
    # Second request (browser stays open, cookies maintained)
    page2 = session.fetch('https://protected-site.com/page2')
    
    # Third request
    page3 = session.fetch('https://protected-site.com/page3')

Key Features

Cloudflare Bypass

Automatically solve Cloudflare Turnstile and Interstitial challenges:
page = StealthyFetcher.fetch(
    'https://nopecha.com/demo/cloudflare',
    headless=True,
    solve_cloudflare=True,  # Enable automatic solving
    timeout=60000           # 60 seconds timeout
)
Supported Cloudflare challenges:
  • Non-interactive Turnstile
  • Interactive Turnstile
  • Interstitial pages
  • “Just a moment” waiting pages

Fingerprint Spoofing

Multiple techniques to avoid detection:
page = StealthyFetcher.fetch(
    'https://example.com',
    headless=True,
    hide_canvas=True,       # Add noise to canvas fingerprinting
    block_webrtc=True,      # Prevent WebRTC IP leaks
    allow_webgl=True        # Keep WebGL enabled (recommended)
)
Canvas Fingerprinting: Adds random noise to canvas operations to prevent tracking.WebRTC Blocking: Forces WebRTC to respect proxy settings, preventing local IP leaks.WebGL: Keep enabled by default - many WAFs check for WebGL support.

Google Search Referer

Make requests appear as if they came from Google search:
page = StealthyFetcher.fetch(
    'https://example.com',
    headless=True,
    google_search=True  # Default: True
)
This sets the referer header to look like: https://www.google.com/search?q=example.com

Request Parameters

StealthyFetcher.fetch()

StealthyFetcher.fetch(
    url='https://example.com',
    
    # Browser configuration
    headless=True,                    # Run in headless mode
    real_chrome=False,                # Use installed Chrome instead of Chromium
    
    # Anti-detection
    solve_cloudflare=False,           # Auto-solve Cloudflare challenges
    hide_canvas=False,                # Canvas fingerprint protection
    block_webrtc=False,               # Block WebRTC IP leaks
    allow_webgl=True,                 # Enable WebGL (recommended)
    
    # Headers and referer
    google_search=True,               # Add Google search referer
    extra_headers={'Custom': 'value'},# Additional headers
    useragent='Mozilla/5.0...',       # Custom user agent
    
    # Timing and waits
    timeout=30000,                    # Operation timeout (ms)
    wait=0,                           # Extra wait after load (ms)
    network_idle=False,               # Wait for network idle
    load_dom=True,                    # Wait for DOM load
    
    # Selectors and actions
    wait_selector='#content',         # Wait for selector
    wait_selector_state='attached',   # Selector state: attached/visible/hidden
    page_action=lambda page: page.click('#button'),  # Custom actions
    
    # Resources and performance
    disable_resources=False,          # Block images, fonts, etc.
    blocked_domains={'analytics.com', 'ads.com'},  # Block domains
    
    # Session and state
    cookies=[{'name': 'session', 'value': 'xyz', 'domain': 'example.com'}],
    user_data_dir='/path/to/profile', # Persistent browser profile
    init_script='/path/to/script.js', # JavaScript to run on page creation
    
    # Locale and timezone
    locale='en-US',                   # Browser locale
    timezone_id='America/New_York',   # Browser timezone
    
    # Advanced
    proxy='http://proxy:8080',        # Proxy configuration
    cdp_url='http://localhost:9222',  # Connect to existing browser
    extra_flags=['--flag=value'],     # Additional Chrome flags
)

Advanced Features

Page Actions

Execute custom automation before returning the response:
def custom_action(page):
    # Click a button
    page.click('button#load-more')
    
    # Wait for new content
    page.wait_for_selector('.new-content')
    
    # Scroll to bottom
    page.evaluate('window.scrollTo(0, document.body.scrollHeight)')

page = StealthyFetcher.fetch(
    'https://example.com',
    headless=True,
    page_action=custom_action
)

Wait for Selectors

Wait for specific elements before returning:
page = StealthyFetcher.fetch(
    'https://spa-site.com',
    headless=True,
    wait_selector='.dynamic-content',
    wait_selector_state='visible',  # Options: attached, visible, hidden
    timeout=60000
)

Resource Blocking

Block unnecessary resources for faster loading:
page = StealthyFetcher.fetch(
    'https://example.com',
    headless=True,
    disable_resources=True,  # Blocks: fonts, images, media, stylesheets, etc.
    blocked_domains={'analytics.google.com', 'doubleclick.net'}
)
Blocked resource types:
  • font, image, media
  • beacon, object, imageset
  • texttrack, websocket
  • csp_report, stylesheet

Persistent Browser Profiles

Maintain browser state across sessions:
with StealthySession(
    headless=True,
    user_data_dir='./browser_profile',  # Persistent profile
    solve_cloudflare=True
) as session:
    # First time: solve Cloudflare and save cookies
    page1 = session.fetch('https://protected-site.com')
    
    # Subsequent requests use saved cookies
    page2 = session.fetch('https://protected-site.com/data')

Real Chrome vs Chromium

Use your installed Chrome browser for maximum compatibility:
page = StealthyFetcher.fetch(
    'https://example.com',
    headless=True,
    real_chrome=True  # Use system Chrome instead of Chromium
)
Using real_chrome=True requires Chrome to be installed on your system.

CDP URL Connection

Connect to an existing browser instance:
# Start Chrome with remote debugging:
# google-chrome --remote-debugging-port=9222

page = StealthyFetcher.fetch(
    'https://example.com',
    cdp_url='http://localhost:9222'
)

Session Management

Basic Session

from scrapling.fetchers import StealthySession

with StealthySession(headless=True) as session:
    page1 = session.fetch('https://example.com/login')
    page2 = session.fetch('https://example.com/dashboard')  # Cookies maintained

Async Session

import asyncio
from scrapling.fetchers import AsyncStealthySession

async def scrape():
    async with AsyncStealthySession(
        headless=True,
        max_pages=3  # Pool of 3 browser tabs
    ) as session:
        # Concurrent requests
        tasks = [
            session.fetch('https://example.com/page1'),
            session.fetch('https://example.com/page2'),
            session.fetch('https://example.com/page3'),
        ]
        results = await asyncio.gather(*tasks)
        
        # Check pool stats
        print(session.get_pool_stats())  # {busy: 0, free: 3, error: 0}

asyncio.run(scrape())

Adaptive Mode

Enable adaptive element finding for website changes:
from scrapling.fetchers import StealthyFetcher

# Enable adaptive mode globally
StealthyFetcher.adaptive = True

page = StealthyFetcher.fetch(
    'https://example.com',
    headless=True,
    network_idle=True
)

# First time: save element signatures
products = page.css('.product', auto_save=True)

# Later, if website structure changes:
products = page.css('.product', adaptive=True)  # Finds elements even if CSS changed

Error Handling

from scrapling.fetchers import StealthyFetcher
from playwright.sync_api import TimeoutError, Error

try:
    page = StealthyFetcher.fetch(
        'https://example.com',
        headless=True,
        timeout=30000,
        solve_cloudflare=True
    )
except TimeoutError:
    print("Request timed out")
except Error as e:
    print(f"Browser error: {e}")

Best Practices

Always use StealthySession when making multiple requests. This keeps the browser open and maintains cookies, significantly improving performance.
Cloudflare solving can take 10-30 seconds. Set timeout=60000 (60s) when using solve_cloudflare=True.
Enable disable_resources=True to block images, fonts, and stylesheets for faster page loads.
Many anti-bot systems check for WebGL support. Keep allow_webgl=True (default) for better stealth.
Keep google_search=True (default) to make requests appear more legitimate.
Use headless=True in production. Set to False only for debugging.

Comparison with Other Fetchers

FeatureFetcherStealthyFetcherDynamicFetcher
Speed⚡⚡⚡⚡⚡
Cloudflare Bypass
Canvas Protection
WebRTC Blocking
JavaScript Execution
Resource UsageLowMediumHigh

Next Steps

Browser Automation

Learn about DynamicFetcher for general automation

Sessions

Master session management

Proxy Rotation

Rotate proxies for stealth sessions

Build docs developers (and LLMs) love