StealthyFetcher

A Fetcher class that uses a completely stealthy browser built on top of Chromium. It works as real browsers, passing almost all online tests and protections with many customization options.

from scrapling import StealthyFetcher

response = StealthyFetcher.fetch(
    'https://example.com',
    headless=True,
    solve_cloudflare=True
)
print(response.status)

StealthyFetcher uses a real Chromium browser with stealth modifications to bypass bot detection and pass anti-bot tests.

Methods

fetch()

Opens up a browser and performs your request based on your chosen options.

StealthyFetcher.fetch(url: str, **kwargs) -> Response

url

str

required

Target URL to fetch

headless

bool

default:"True"

Run the browser in headless/hidden (default) or headful/visible mode

disable_resources

bool

default:"False"

Drop requests for unnecessary resources for a speed boost. Requests dropped are of type: font, image, media, beacon, object, imageset, texttrack, websocket, csp_report, and stylesheet

blocked_domains

set

A set of domain names to block requests to. Subdomains are also matched (e.g., "example.com" blocks "sub.example.com" too)

useragent

str

Pass a useragent string to be used. Otherwise the fetcher will generate a real Useragent of the same browser and use it

dict

Set cookies for the next request

network_idle

bool

default:"False"

Wait for the page until there are no network connections for at least 500 ms

timeout

int

default:"30000"

The timeout in milliseconds that is used in all operations and waits through the page

wait

int

default:"0"

The time (milliseconds) the fetcher will wait after everything finishes before closing the page and returning the Response object

page_action

Callable

Added for automation. A function that takes the page object and does the automation you need

wait_selector

str

Wait for a specific CSS selector to be in a specific state

wait_selector_state

str

default:"attached"

The state to wait for the selector given with wait_selector. Options: attached, detached, visible, hidden

init_script

str

An absolute path to a JavaScript file to be executed on page creation for all pages in this session

locale

str

Specify user locale, for example, en-GB, de-DE, etc. Locale will affect navigator.language value, Accept-Language request header value as well as number and date formatting rules. Defaults to the system default locale

timezone_id

str

Changes the timezone of the browser. Defaults to the system timezone

solve_cloudflare

bool

default:"False"

Solves all types of Cloudflare’s Turnstile/Interstitial challenges before returning the response to you

real_chrome

bool

default:"False"

If you have a Chrome browser installed on your device, enable this, and the Fetcher will launch an instance of your browser and use it

hide_canvas

bool

default:"False"

Add random noise to canvas operations to prevent fingerprinting

block_webrtc

bool

default:"False"

Forces WebRTC to respect proxy settings to prevent local IP address leak

allow_webgl

bool

default:"True"

Enabled by default. Disabling it disables WebGL and WebGL 2.0 support entirely. Disabling WebGL is not recommended as many WAFs now check if WebGL is enabled

load_dom

bool

default:"True"

Enabled by default, wait for all JavaScript on page(s) to fully load and execute

cdp_url

str

Instead of launching a new browser instance, connect to this CDP URL to control real browsers through CDP

google_search

bool

default:"True"

Enabled by default, Scrapling will set the referer header to be as if this request came from a Google search of this website’s domain name

extra_headers

dict

A dictionary of extra headers to add to the request. The referer set by the google_search argument takes priority over the referer set here if used together

proxy

str | dict

The proxy to be used with requests. It can be a string or a dictionary with the keys ‘server’, ‘username’, and ‘password’ only

user_data_dir

str

Path to a User Data Directory, which stores browser session data like cookies and local storage. The default is to create a temporary directory

extra_flags

list

A list of additional browser flags to pass to the browser on launch

selector_config

dict

The arguments that will be passed in the end while creating the final Selector’s class

additional_args

dict

Additional arguments to be passed to Playwright’s context as additional settings, and it takes higher priority than Scrapling’s settings

Response

A Response object containing the fetched page data

async_fetch()

Asynchronous version of fetch(). Opens up a browser and performs your request.

import asyncio
from scrapling import StealthyFetcher

async def main():
    response = await StealthyFetcher.async_fetch(
        'https://example.com',
        solve_cloudflare=True
    )
    print(response.status)

asyncio.run(main())

StealthyFetcher.async_fetch(url: str, **kwargs) -> Response

All parameters are identical to fetch().

Response

An awaitable Response object containing the fetched page data

Usage Examples

Basic Stealth Request

from scrapling import StealthyFetcher

response = StealthyFetcher.fetch('https://example.com')
print(response.text)

Solve Cloudflare Challenge

response = StealthyFetcher.fetch(
    'https://protected-site.com',
    solve_cloudflare=True,
    timeout=60000  # Longer timeout for challenge solving
)

Custom Page Automation

def click_button(page):
    page.click('#submit-button')
    page.wait_for_selector('.results')

response = StealthyFetcher.fetch(
    'https://example.com',
    page_action=click_button,
    wait_selector='.results',
    wait_selector_state='visible'
)

With Proxy

response = StealthyFetcher.fetch(
    'https://example.com',
    proxy='http://username:password@proxy.example.com:8080'
)

Performance Optimization

response = StealthyFetcher.fetch(
    'https://example.com',
    disable_resources=True,  # Block images, fonts, etc.
    blocked_domains={'ads.example.com', 'tracking.example.com'}
)

Fetchers

Parsing

Spiders

Utilities

StealthyFetcher