Fetchers - Scrapling

Overview

Scrapling provides three main fetcher types, each optimized for different scraping scenarios. All fetchers return the same Response object, making it easy to switch between them.

Fetcher

Fast HTTP requests with browser impersonation

DynamicFetcher

Browser automation for JavaScript sites

StealthyFetcher

Stealth browser with anti-detection

Fetcher (HTTP Client)

The basic Fetcher uses curl_cffi for fast HTTP requests with browser fingerprint impersonation.

When to Use

Static HTML pages
APIs and JSON endpoints
Sites that don’t require JavaScript
High-performance scraping (100+ requests/second)
When you need HTTP/2 or HTTP/3

Basic Usage

from scrapling import Fetcher

response = Fetcher.fetch('https://httpbin.org/get')
print(response.status)  # 200
print(response.text)    # Response body

Browser Impersonation

Scrapling can impersonate various browsers to bypass basic fingerprint detection:

# Impersonate Chrome (default)
response = Fetcher.fetch(
    'https://httpbin.org/headers',
    impersonate='chrome'
)

# Impersonate Firefox
response = Fetcher.fetch(
    'https://httpbin.org/headers',
    impersonate='firefox'
)

# Random browser from list
response = Fetcher.fetch(
    'https://httpbin.org/headers',
    impersonate=['chrome', 'firefox', 'safari', 'edge']
)

Stealth Headers

By default, Scrapling generates realistic browser headers:

response = Fetcher.fetch(
    'https://httpbin.org/headers',
    stealthy_headers=True  # Enabled by default
)

# Headers include:
# - User-Agent (matches impersonated browser)
# - Accept, Accept-Language, Accept-Encoding
# - Referer (simulates Google search)
# - sec-ch-ua, sec-ch-ua-mobile, sec-ch-ua-platform
# - And more realistic browser headers

Disable for custom headers:

response = Fetcher.fetch(
    'https://httpbin.org/headers',
    stealthy_headers=False,
    headers={'User-Agent': 'MyBot/1.0'}
)

Request Parameters

response = Fetcher.fetch(
    'https://httpbin.org/get',
    params={'key': 'value'},        # Query parameters
    headers={'Custom': 'Header'},   # Custom headers
    cookies={'session': 'abc123'},  # Cookies
    timeout=30,                     # Timeout in seconds
    retries=3,                      # Number of retries
    retry_delay=1,                  # Delay between retries
    follow_redirects=True,          # Follow redirects
    max_redirects=30,               # Max redirect hops
    verify=True,                    # Verify SSL certificates
    proxy='http://user:pass@host:port',  # Proxy URL
    http3=False                     # Enable HTTP/3
)

POST Requests

# Form data
response = Fetcher.post(
    'https://httpbin.org/post',
    data={'key': 'value'}
)

# JSON data
response = Fetcher.post(
    'https://httpbin.org/post',
    json={'key': 'value'}
)

# Files
response = Fetcher.post(
    'https://httpbin.org/post',
    files={'file': open('data.txt', 'rb')}
)

Async Support

from scrapling import AsyncFetcher
import asyncio

async def scrape():
    response = await AsyncFetcher.get('https://httpbin.org/get')
    return response.json()

data = asyncio.run(scrape())

DynamicFetcher (Browser Automation)

The DynamicFetcher uses Playwright to control a real Chromium browser, perfect for JavaScript-heavy sites.

When to Use

Single-page applications (SPAs)
Sites with dynamic content loaded by JavaScript
Pages requiring user interaction
Sites that check for browser features
When you need to execute custom JavaScript

Basic Usage

from scrapling import DynamicFetcher

response = DynamicFetcher.fetch('https://example.com')
print(response.status)  # 200
title = response.css('title::text').get()

JavaScript Execution

Wait for JavaScript to fully load:

response = DynamicFetcher.fetch(
    'https://spa-site.com',
    load_dom=True,       # Wait for DOMContentLoaded (default: True)
    network_idle=True,   # Wait for network idle (default: False)
    wait=1000           # Additional wait in milliseconds
)

Resource Blocking

Speed up requests by blocking unnecessary resources:

response = DynamicFetcher.fetch(
    'https://example.com',
    disable_resources=True  # Blocks: images, fonts, media, etc.
)

# Block specific domains
response = DynamicFetcher.fetch(
    'https://example.com',
    blocked_domains={'google-analytics.com', 'facebook.com', 'ads.example.com'}
)

Wait for Selectors

Wait for specific elements before returning:

response = DynamicFetcher.fetch(
    'https://example.com',
    wait_selector='.product-list',           # CSS selector
    wait_selector_state='visible'            # attached, visible, hidden
)

Page Automation

Execute custom browser actions:

def automate(page):
    # Click button
    page.click('button#load-more')
    
    # Fill form
    page.fill('input[name="search"]', 'query')
    page.press('input[name="search"]', 'Enter')
    
    # Scroll
    page.evaluate('window.scrollTo(0, document.body.scrollHeight)')
    
    # Wait for element
    page.wait_for_selector('.results')

response = DynamicFetcher.fetch(
    'https://example.com',
    page_action=automate
)

Custom JavaScript

Inject JavaScript on page load:

# Create init.js file
with open('/path/to/init.js', 'w') as f:
    f.write('''
        // Runs on every page load
        Object.defineProperty(navigator, 'webdriver', {get: () => false});
        console.log('Custom script loaded');
    ''')

response = DynamicFetcher.fetch(
    'https://example.com',
    init_script='/path/to/init.js'
)

Browser Configuration

response = DynamicFetcher.fetch(
    'https://example.com',
    headless=True,               # Run in headless mode (default)
    useragent='Custom UA',       # Custom user agent
    locale='en-US',              # Browser locale
    timeout=30000,               # Timeout in milliseconds
    proxy='http://host:port',    # Proxy configuration
    extra_headers={'X-Custom': 'Value'},  # Extra headers
    extra_flags=['--flag1', '--flag2']    # Browser flags
)

Connect to Existing Browser

# Use real Chrome installation
response = DynamicFetcher.fetch(
    'https://example.com',
    real_chrome=True
)

# Connect to remote browser via CDP
response = DynamicFetcher.fetch(
    'https://example.com',
    cdp_url='http://localhost:9222'
)

Async Browser Automation

import asyncio

async def scrape():
    response = await DynamicFetcher.async_fetch('https://example.com')
    return response.css('title::text').get()

title = asyncio.run(scrape())

StealthyFetcher (Anti-Detection)

The StealthyFetcher extends DynamicFetcher with advanced anti-detection techniques.

When to Use

Sites with bot detection (Cloudflare, DataDome, PerimeterX)
Sites that check for headless browsers
Sites with aggressive fingerprinting
When you need to bypass CAPTCHAs
Production scraping at scale

Basic Usage

from scrapling import StealthyFetcher

response = StealthyFetcher.fetch('https://protected-site.com')

Cloudflare Solver

Automatically solve Cloudflare challenges:

response = StealthyFetcher.fetch(
    'https://cloudflare-protected.com',
    solve_cloudflare=True  # Solves Turnstile and Interstitial challenges
)

Anti-Fingerprinting

response = StealthyFetcher.fetch(
    'https://protected-site.com',
    hide_canvas=True,     # Randomize canvas fingerprint
    block_webrtc=True,    # Prevent WebRTC IP leak
    allow_webgl=True      # Keep WebGL enabled (recommended)
)

Stealth Configuration

All DynamicFetcher options plus:

response = StealthyFetcher.fetch(
    'https://protected-site.com',
    headless=True,                # Stealth works in headless mode
    hide_canvas=True,            # Canvas noise injection
    block_webrtc=True,           # WebRTC leak prevention
    allow_webgl=True,            # WebGL support (recommended)
    user_data_dir='/path/to/profile',  # Persistent browser profile
    timezone_id='America/New_York'     # Custom timezone
)

Stealth Features

The StealthyFetcher automatically:

Patches navigator.webdriver detection
Randomizes browser fingerprints
Adds canvas noise
Blocks WebRTC leaks
Mimics real user behavior
Passes most bot detection tests

Choosing the Right Fetcher

Use Fetcher when:

Site is static HTML
No JavaScript required
Speed is critical
API endpoints
Simple scraping tasks

Use DynamicFetcher when:

JavaScript is required
SPA or dynamic content
Need to interact with page
Custom automation needed

Use StealthyFetcher when:

Bot detection present
Cloudflare protection
Aggressive fingerprinting
Production at scale
Need maximum stealth

Performance Comparison

Fetcher: ~100-200 req/sDynamicFetcher: ~5-10 req/sStealthyFetcher: ~3-8 req/s

Unified Response API

All fetchers return the same Response type with identical parsing capabilities:

# All these work identically
response1 = Fetcher.fetch(url)
response2 = DynamicFetcher.fetch(url)
response3 = StealthyFetcher.fetch(url)

# Same parsing API
title1 = response1.css('title::text').get()
title2 = response2.css('title::text').get()
title3 = response3.css('title::text').get()

# Same HTTP metadata
print(response1.status, response1.headers)
print(response2.status, response2.headers)
print(response3.status, response3.headers)

Error Handling

All fetchers support automatic retries:

try:
    response = Fetcher.fetch(
        'https://example.com',
        retries=3,        # Retry up to 3 times
        retry_delay=1     # Wait 1 second between retries
    )
except Exception as e:
    print(f"Failed after retries: {e}")

Advanced: Proxy Rotation

All fetchers support the ProxyRotator for automatic proxy rotation:

from scrapling.fetchers import Fetcher, ProxyRotator

# Create proxy pool
rotator = ProxyRotator([
    'http://proxy1.com:8080',
    'http://proxy2.com:8080',
    {'server': 'http://proxy3.com:8080', 'username': 'user', 'password': 'pass'}
])

# Use with session (see Sessions concept)
from scrapling.fetchers import FetcherSession

with FetcherSession(proxy_rotator=rotator) as session:
    # Automatically rotates proxies on failure
    response1 = session.get('https://httpbin.org/ip')
    response2 = session.get('https://httpbin.org/ip')

Implementation Details

Fetcher Hierarchy

# From scrapling/fetchers/requests.py
from scrapling.engines.static import FetcherClient
from scrapling.engines.toolbelt.custom import BaseFetcher

__FetcherClientInstance__ = FetcherClient()

class Fetcher(BaseFetcher):
    get = __FetcherClientInstance__.get
    post = __FetcherClientInstance__.post
    put = __FetcherClientInstance__.put
    delete = __FetcherClientInstance__.delete

DynamicFetcher Internals

# From scrapling/fetchers/chrome.py
from scrapling.engines._browsers._controllers import DynamicSession

class DynamicFetcher(BaseFetcher):
    @classmethod
    def fetch(cls, url: str, **kwargs):
        # Launches browser session for single request
        with DynamicSession(**kwargs) as session:
            return session.fetch(url)

StealthyFetcher Internals

# From scrapling/fetchers/stealth_chrome.py  
from scrapling.engines._browsers._stealth import StealthySession

class StealthyFetcher(BaseFetcher):
    @classmethod
    def fetch(cls, url: str, **kwargs):
        # Uses stealth-enhanced browser session
        with StealthySession(**kwargs) as engine:
            return engine.fetch(url)

Next Steps

Parsing

Learn to extract data from responses

Sessions

Use sessions for persistent connections

API Reference

Complete fetcher API documentation

Getting Started

Core Concepts

Fetching

Parsing & Selection

Spiders

CLI & Tools

AI Integration

Guides

Tutorials

Documentation Index

​Overview

Fetcher

DynamicFetcher

StealthyFetcher

​Fetcher (HTTP Client)

​When to Use

​Basic Usage

​Browser Impersonation

​Stealth Headers

​Request Parameters

​POST Requests

​Async Support

​DynamicFetcher (Browser Automation)

​When to Use

​Basic Usage

​JavaScript Execution

​Resource Blocking

​Wait for Selectors

​Page Automation

​Custom JavaScript

​Browser Configuration

​Connect to Existing Browser

​Async Browser Automation

​StealthyFetcher (Anti-Detection)

​When to Use

​Basic Usage

​Cloudflare Solver

​Anti-Fingerprinting

​Stealth Configuration

​Stealth Features

​Choosing the Right Fetcher

Use Fetcher when:

Use DynamicFetcher when:

Use StealthyFetcher when:

Performance Comparison

​Unified Response API

​Error Handling

​Advanced: Proxy Rotation

​Implementation Details

​Fetcher Hierarchy

​DynamicFetcher Internals

​StealthyFetcher Internals

​Next Steps

Parsing

Sessions

API Reference

Build docs developers (and LLMs) love

Overview

Fetcher (HTTP Client)

When to Use

Basic Usage

Browser Impersonation

Stealth Headers

Request Parameters

POST Requests

Async Support

DynamicFetcher (Browser Automation)

When to Use

Basic Usage

JavaScript Execution

Resource Blocking

Wait for Selectors

Page Automation

Custom JavaScript

Browser Configuration

Connect to Existing Browser

Async Browser Automation

StealthyFetcher (Anti-Detection)

When to Use

Basic Usage

Cloudflare Solver

Anti-Fingerprinting

Stealth Configuration

Stealth Features

Choosing the Right Fetcher

Unified Response API

Error Handling

Advanced: Proxy Rotation

Implementation Details

Fetcher Hierarchy

DynamicFetcher Internals

StealthyFetcher Internals

Next Steps