Skip to main content

Overview

Scrapling provides three main fetcher types, each optimized for different scraping scenarios. All fetchers return the same Response object, making it easy to switch between them.

Fetcher

Fast HTTP requests with browser impersonation

DynamicFetcher

Browser automation for JavaScript sites

StealthyFetcher

Stealth browser with anti-detection

Fetcher (HTTP Client)

The basic Fetcher uses curl_cffi for fast HTTP requests with browser fingerprint impersonation.

When to Use

  • Static HTML pages
  • APIs and JSON endpoints
  • Sites that don’t require JavaScript
  • High-performance scraping (100+ requests/second)
  • When you need HTTP/2 or HTTP/3

Basic Usage

from scrapling import Fetcher

response = Fetcher.fetch('https://httpbin.org/get')
print(response.status)  # 200
print(response.text)    # Response body

Browser Impersonation

Scrapling can impersonate various browsers to bypass basic fingerprint detection:
# Impersonate Chrome (default)
response = Fetcher.fetch(
    'https://httpbin.org/headers',
    impersonate='chrome'
)

# Impersonate Firefox
response = Fetcher.fetch(
    'https://httpbin.org/headers',
    impersonate='firefox'
)

# Random browser from list
response = Fetcher.fetch(
    'https://httpbin.org/headers',
    impersonate=['chrome', 'firefox', 'safari', 'edge']
)

Stealth Headers

By default, Scrapling generates realistic browser headers:
response = Fetcher.fetch(
    'https://httpbin.org/headers',
    stealthy_headers=True  # Enabled by default
)

# Headers include:
# - User-Agent (matches impersonated browser)
# - Accept, Accept-Language, Accept-Encoding
# - Referer (simulates Google search)
# - sec-ch-ua, sec-ch-ua-mobile, sec-ch-ua-platform
# - And more realistic browser headers
Disable for custom headers:
response = Fetcher.fetch(
    'https://httpbin.org/headers',
    stealthy_headers=False,
    headers={'User-Agent': 'MyBot/1.0'}
)

Request Parameters

response = Fetcher.fetch(
    'https://httpbin.org/get',
    params={'key': 'value'},        # Query parameters
    headers={'Custom': 'Header'},   # Custom headers
    cookies={'session': 'abc123'},  # Cookies
    timeout=30,                     # Timeout in seconds
    retries=3,                      # Number of retries
    retry_delay=1,                  # Delay between retries
    follow_redirects=True,          # Follow redirects
    max_redirects=30,               # Max redirect hops
    verify=True,                    # Verify SSL certificates
    proxy='http://user:pass@host:port',  # Proxy URL
    http3=False                     # Enable HTTP/3
)

POST Requests

# Form data
response = Fetcher.post(
    'https://httpbin.org/post',
    data={'key': 'value'}
)

# JSON data
response = Fetcher.post(
    'https://httpbin.org/post',
    json={'key': 'value'}
)

# Files
response = Fetcher.post(
    'https://httpbin.org/post',
    files={'file': open('data.txt', 'rb')}
)

Async Support

from scrapling import AsyncFetcher
import asyncio

async def scrape():
    response = await AsyncFetcher.get('https://httpbin.org/get')
    return response.json()

data = asyncio.run(scrape())

DynamicFetcher (Browser Automation)

The DynamicFetcher uses Playwright to control a real Chromium browser, perfect for JavaScript-heavy sites.

When to Use

  • Single-page applications (SPAs)
  • Sites with dynamic content loaded by JavaScript
  • Pages requiring user interaction
  • Sites that check for browser features
  • When you need to execute custom JavaScript

Basic Usage

from scrapling import DynamicFetcher

response = DynamicFetcher.fetch('https://example.com')
print(response.status)  # 200
title = response.css('title::text').get()

JavaScript Execution

Wait for JavaScript to fully load:
response = DynamicFetcher.fetch(
    'https://spa-site.com',
    load_dom=True,       # Wait for DOMContentLoaded (default: True)
    network_idle=True,   # Wait for network idle (default: False)
    wait=1000           # Additional wait in milliseconds
)

Resource Blocking

Speed up requests by blocking unnecessary resources:
response = DynamicFetcher.fetch(
    'https://example.com',
    disable_resources=True  # Blocks: images, fonts, media, etc.
)

# Block specific domains
response = DynamicFetcher.fetch(
    'https://example.com',
    blocked_domains={'google-analytics.com', 'facebook.com', 'ads.example.com'}
)

Wait for Selectors

Wait for specific elements before returning:
response = DynamicFetcher.fetch(
    'https://example.com',
    wait_selector='.product-list',           # CSS selector
    wait_selector_state='visible'            # attached, visible, hidden
)

Page Automation

Execute custom browser actions:
def automate(page):
    # Click button
    page.click('button#load-more')
    
    # Fill form
    page.fill('input[name="search"]', 'query')
    page.press('input[name="search"]', 'Enter')
    
    # Scroll
    page.evaluate('window.scrollTo(0, document.body.scrollHeight)')
    
    # Wait for element
    page.wait_for_selector('.results')

response = DynamicFetcher.fetch(
    'https://example.com',
    page_action=automate
)

Custom JavaScript

Inject JavaScript on page load:
# Create init.js file
with open('/path/to/init.js', 'w') as f:
    f.write('''
        // Runs on every page load
        Object.defineProperty(navigator, 'webdriver', {get: () => false});
        console.log('Custom script loaded');
    ''')

response = DynamicFetcher.fetch(
    'https://example.com',
    init_script='/path/to/init.js'
)

Browser Configuration

response = DynamicFetcher.fetch(
    'https://example.com',
    headless=True,               # Run in headless mode (default)
    useragent='Custom UA',       # Custom user agent
    locale='en-US',              # Browser locale
    timeout=30000,               # Timeout in milliseconds
    proxy='http://host:port',    # Proxy configuration
    extra_headers={'X-Custom': 'Value'},  # Extra headers
    extra_flags=['--flag1', '--flag2']    # Browser flags
)

Connect to Existing Browser

# Use real Chrome installation
response = DynamicFetcher.fetch(
    'https://example.com',
    real_chrome=True
)

# Connect to remote browser via CDP
response = DynamicFetcher.fetch(
    'https://example.com',
    cdp_url='http://localhost:9222'
)

Async Browser Automation

import asyncio

async def scrape():
    response = await DynamicFetcher.async_fetch('https://example.com')
    return response.css('title::text').get()

title = asyncio.run(scrape())

StealthyFetcher (Anti-Detection)

The StealthyFetcher extends DynamicFetcher with advanced anti-detection techniques.

When to Use

  • Sites with bot detection (Cloudflare, DataDome, PerimeterX)
  • Sites that check for headless browsers
  • Sites with aggressive fingerprinting
  • When you need to bypass CAPTCHAs
  • Production scraping at scale

Basic Usage

from scrapling import StealthyFetcher

response = StealthyFetcher.fetch('https://protected-site.com')

Cloudflare Solver

Automatically solve Cloudflare challenges:
response = StealthyFetcher.fetch(
    'https://cloudflare-protected.com',
    solve_cloudflare=True  # Solves Turnstile and Interstitial challenges
)

Anti-Fingerprinting

response = StealthyFetcher.fetch(
    'https://protected-site.com',
    hide_canvas=True,     # Randomize canvas fingerprint
    block_webrtc=True,    # Prevent WebRTC IP leak
    allow_webgl=True      # Keep WebGL enabled (recommended)
)

Stealth Configuration

All DynamicFetcher options plus:
response = StealthyFetcher.fetch(
    'https://protected-site.com',
    headless=True,                # Stealth works in headless mode
    hide_canvas=True,            # Canvas noise injection
    block_webrtc=True,           # WebRTC leak prevention
    allow_webgl=True,            # WebGL support (recommended)
    user_data_dir='/path/to/profile',  # Persistent browser profile
    timezone_id='America/New_York'     # Custom timezone
)

Stealth Features

The StealthyFetcher automatically:
  • Patches navigator.webdriver detection
  • Randomizes browser fingerprints
  • Adds canvas noise
  • Blocks WebRTC leaks
  • Mimics real user behavior
  • Passes most bot detection tests

Choosing the Right Fetcher

Use Fetcher when:

  • Site is static HTML
  • No JavaScript required
  • Speed is critical
  • API endpoints
  • Simple scraping tasks

Use DynamicFetcher when:

  • JavaScript is required
  • SPA or dynamic content
  • Need to interact with page
  • Custom automation needed

Use StealthyFetcher when:

  • Bot detection present
  • Cloudflare protection
  • Aggressive fingerprinting
  • Production at scale
  • Need maximum stealth

Performance Comparison

Fetcher: ~100-200 req/sDynamicFetcher: ~5-10 req/sStealthyFetcher: ~3-8 req/s

Unified Response API

All fetchers return the same Response type with identical parsing capabilities:
# All these work identically
response1 = Fetcher.fetch(url)
response2 = DynamicFetcher.fetch(url)
response3 = StealthyFetcher.fetch(url)

# Same parsing API
title1 = response1.css('title::text').get()
title2 = response2.css('title::text').get()
title3 = response3.css('title::text').get()

# Same HTTP metadata
print(response1.status, response1.headers)
print(response2.status, response2.headers)
print(response3.status, response3.headers)

Error Handling

All fetchers support automatic retries:
try:
    response = Fetcher.fetch(
        'https://example.com',
        retries=3,        # Retry up to 3 times
        retry_delay=1     # Wait 1 second between retries
    )
except Exception as e:
    print(f"Failed after retries: {e}")

Advanced: Proxy Rotation

All fetchers support the ProxyRotator for automatic proxy rotation:
from scrapling.fetchers import Fetcher, ProxyRotator

# Create proxy pool
rotator = ProxyRotator([
    'http://proxy1.com:8080',
    'http://proxy2.com:8080',
    {'server': 'http://proxy3.com:8080', 'username': 'user', 'password': 'pass'}
])

# Use with session (see Sessions concept)
from scrapling.fetchers import FetcherSession

with FetcherSession(proxy_rotator=rotator) as session:
    # Automatically rotates proxies on failure
    response1 = session.get('https://httpbin.org/ip')
    response2 = session.get('https://httpbin.org/ip')

Implementation Details

Fetcher Hierarchy

# From scrapling/fetchers/requests.py
from scrapling.engines.static import FetcherClient
from scrapling.engines.toolbelt.custom import BaseFetcher

__FetcherClientInstance__ = FetcherClient()

class Fetcher(BaseFetcher):
    get = __FetcherClientInstance__.get
    post = __FetcherClientInstance__.post
    put = __FetcherClientInstance__.put
    delete = __FetcherClientInstance__.delete

DynamicFetcher Internals

# From scrapling/fetchers/chrome.py
from scrapling.engines._browsers._controllers import DynamicSession

class DynamicFetcher(BaseFetcher):
    @classmethod
    def fetch(cls, url: str, **kwargs):
        # Launches browser session for single request
        with DynamicSession(**kwargs) as session:
            return session.fetch(url)

StealthyFetcher Internals

# From scrapling/fetchers/stealth_chrome.py  
from scrapling.engines._browsers._stealth import StealthySession

class StealthyFetcher(BaseFetcher):
    @classmethod
    def fetch(cls, url: str, **kwargs):
        # Uses stealth-enhanced browser session
        with StealthySession(**kwargs) as engine:
            return engine.fetch(url)

Next Steps

Parsing

Learn to extract data from responses

Sessions

Use sessions for persistent connections

API Reference

Complete fetcher API documentation

Build docs developers (and LLMs) love