Skip to main content
DynamicFetcher provides full browser automation capabilities using Playwright. It’s ideal for scraping JavaScript-heavy single-page applications (SPAs) and sites requiring complex user interactions.

Basic Usage

One-Off Requests

from scrapling.fetchers import DynamicFetcher

# Simple fetch
page = DynamicFetcher.fetch(
    'https://quotes.toscrape.com/',
    headless=True
)
quotes = page.css('.quote .text::text').getall()

# Wait for network idle
page = DynamicFetcher.fetch(
    'https://spa-site.com',
    headless=True,
    network_idle=True,
    load_dom=True
)

With DynamicSession

For multiple requests, use DynamicSession to keep the browser open:
from scrapling.fetchers import DynamicSession

with DynamicSession(
    headless=True,
    network_idle=True,
    disable_resources=False
) as session:
    # First request
    page1 = session.fetch('https://example.com/page1')
    
    # Second request (browser stays open)
    page2 = session.fetch('https://example.com/page2')
    
    # XPath selector
    data = page2.xpath('//span[@class="text"]/text()').getall()

Key Features

JavaScript Execution

page = DynamicFetcher.fetch(
    'https://spa-site.com',
    headless=True,
    load_dom=True  # Wait for JavaScript to execute
)

Network Idle

Wait for all network requests to complete:
page = DynamicFetcher.fetch(
    'https://example.com',
    headless=True,
    network_idle=True  # Wait until no network connections for 500ms
)

Custom Page Actions

Execute custom automation before returning the response:
def interact(page):
    # Click a button
    page.click('button.load-more')
    
    # Fill a form
    page.fill('input[name="search"]', 'query')
    page.press('input[name="search"]', 'Enter')
    
    # Wait for element
    page.wait_for_selector('.results')
    
    # Scroll page
    page.evaluate('window.scrollTo(0, document.body.scrollHeight)')

page = DynamicFetcher.fetch(
    'https://example.com',
    headless=True,
    page_action=interact
)

Request Parameters

DynamicFetcher.fetch()

DynamicFetcher.fetch(
    url='https://example.com',
    
    # Browser configuration
    headless=True,                    # Run in headless mode
    real_chrome=False,                # Use installed Chrome instead of Chromium
    
    # Timing and waits
    timeout=30000,                    # Operation timeout (ms)
    wait=0,                           # Extra wait after load (ms)
    network_idle=False,               # Wait for network idle
    load_dom=True,                    # Wait for DOM load
    
    # Selectors and actions
    wait_selector='#content',         # Wait for selector
    wait_selector_state='attached',   # Selector state: attached/visible/hidden
    page_action=lambda page: page.click('#button'),  # Custom actions
    
    # Headers and referer
    google_search=True,               # Add Google search referer
    extra_headers={'Custom': 'value'},# Additional headers
    useragent='Mozilla/5.0...',       # Custom user agent
    
    # Resources and performance
    disable_resources=False,          # Block images, fonts, etc.
    blocked_domains={'ads.com'},      # Block specific domains
    
    # Session and state
    cookies=[{'name': 'session', 'value': 'xyz', 'domain': 'example.com'}],
    init_script='/path/to/script.js', # JavaScript to run on page creation
    
    # Locale
    locale='en-US',                   # Browser locale
    
    # Advanced
    proxy='http://proxy:8080',        # Proxy configuration
    cdp_url='http://localhost:9222',  # Connect to existing browser
    extra_flags=['--flag=value'],     # Additional Chrome flags
)

Advanced Features

Wait for Selectors

Wait for specific elements before returning:
page = DynamicFetcher.fetch(
    'https://example.com',
    headless=True,
    wait_selector='.dynamic-content',
    wait_selector_state='visible',  # Options: attached, visible, hidden
    timeout=60000
)
Selector states:
  • attached: Element exists in DOM
  • visible: Element is visible on page
  • hidden: Element exists but is hidden

Resource Blocking

Block unnecessary resources for faster loading:
page = DynamicFetcher.fetch(
    'https://example.com',
    headless=True,
    disable_resources=True,  # Block fonts, images, media, stylesheets
    blocked_domains={'analytics.google.com', 'facebook.com'}
)
Blocked resource types when disable_resources=True:
  • font, image, media
  • beacon, object, imageset
  • texttrack, websocket
  • csp_report, stylesheet

Initialization Scripts

Run JavaScript on every page creation:
# Create init.js
with open('init.js', 'w') as f:
    f.write('''
    // Override navigator.webdriver
    Object.defineProperty(navigator, 'webdriver', {get: () => false});
    
    // Add custom properties
    window.customProp = 'value';
    ''')

page = DynamicFetcher.fetch(
    'https://example.com',
    headless=True,
    init_script='/path/to/init.js'
)

Real Chrome vs Chromium

Use your installed Chrome browser:
page = DynamicFetcher.fetch(
    'https://example.com',
    headless=True,
    real_chrome=True  # Use system Chrome
)
Requires Chrome to be installed on your system.

CDP Connection

Connect to an existing browser instance:
# Start Chrome with remote debugging:
google-chrome --remote-debugging-port=9222
page = DynamicFetcher.fetch(
    'https://example.com',
    cdp_url='http://localhost:9222'
)

Custom Headers and Referer

page = DynamicFetcher.fetch(
    'https://example.com',
    headless=True,
    google_search=True,  # Add Google search referer (default)
    extra_headers={
        'Authorization': 'Bearer token',
        'Custom-Header': 'value'
    }
)

Session Management

Basic Session

from scrapling.fetchers import DynamicSession

with DynamicSession(
    headless=True,
    disable_resources=True,
    network_idle=True
) as session:
    page1 = session.fetch('https://example.com/login')
    page2 = session.fetch('https://example.com/dashboard')  # Cookies maintained

Async Session with Page Pool

import asyncio
from scrapling.fetchers import AsyncDynamicSession

async def scrape():
    async with AsyncDynamicSession(
        headless=True,
        max_pages=5  # Pool of 5 browser tabs
    ) as session:
        # Concurrent requests (reuses tabs)
        tasks = [
            session.fetch(f'https://example.com/page{i}')
            for i in range(10)
        ]
        
        # Check pool status
        print(session.get_pool_stats())  # {busy: 5, free: 0, error: 0}
        
        results = await asyncio.gather(*tasks)
        
        print(session.get_pool_stats())  # {busy: 0, free: 5, error: 0}

asyncio.run(scrape())

Per-Request Proxy Override

with DynamicSession(headless=True) as session:
    # Default session proxy
    page1 = session.fetch('https://example.com')
    
    # Override with different proxy
    page2 = session.fetch(
        'https://example.com',
        proxy='http://different-proxy:8080'
    )

Practical Examples

Infinite Scroll

def scroll_page(page):
    for _ in range(5):  # Scroll 5 times
        page.evaluate('window.scrollTo(0, document.body.scrollHeight)')
        page.wait_for_timeout(2000)  # Wait 2s between scrolls

page = DynamicFetcher.fetch(
    'https://infinite-scroll-site.com',
    headless=True,
    page_action=scroll_page
)

items = page.css('.item').getall()

Form Submission

def submit_form(page):
    # Fill form fields
    page.fill('input[name="username"]', 'user')
    page.fill('input[name="password"]', 'pass')
    
    # Submit
    page.click('button[type="submit"]')
    
    # Wait for redirect
    page.wait_for_url('**/dashboard')

page = DynamicFetcher.fetch(
    'https://example.com/login',
    headless=True,
    page_action=submit_form
)

Load More Button

def load_all(page):
    while True:
        try:
            # Click "Load More" button
            page.click('button.load-more', timeout=3000)
            page.wait_for_timeout(1000)
        except:
            # Button no longer exists
            break

page = DynamicFetcher.fetch(
    'https://example.com',
    headless=True,
    page_action=load_all
)

all_items = page.css('.item').getall()

Screenshot Capture

def capture_screenshot(page):
    page.screenshot(path='screenshot.png', full_page=True)

page = DynamicFetcher.fetch(
    'https://example.com',
    headless=True,
    page_action=capture_screenshot
)

Error Handling

from scrapling.fetchers import DynamicFetcher
from playwright.sync_api import TimeoutError, Error

try:
    page = DynamicFetcher.fetch(
        'https://example.com',
        headless=True,
        timeout=30000,
        wait_selector='.content',
        wait_selector_state='visible'
    )
except TimeoutError:
    print("Request or selector wait timed out")
except Error as e:
    print(f"Browser error: {e}")

Best Practices

Always use DynamicSession when making multiple requests. This keeps the browser open and maintains cookies.
Enable disable_resources=True to block images, fonts, and stylesheets for faster page loads.
Complex SPAs may need longer timeouts. Set timeout=60000 or higher for slow-loading pages.
Only use network_idle=True when necessary - it adds extra wait time. For most cases, load_dom=True is sufficient.
Use page_action instead of making multiple fetch calls for interactions on the same page.
Use max_pages to control concurrent tab usage in async sessions. Default is 1.

Comparison with Other Fetchers

FeatureFetcherStealthyFetcherDynamicFetcher
Speed⚡⚡⚡⚡⚡
JavaScript
Cloudflare Bypass
Page Actions
Stealth Features
Resource UsageLowMediumHigh
Best ForStatic sitesAnti-bot bypassSPAs, automation

Next Steps

Stealthy Mode

Learn about StealthyFetcher for anti-bot bypass

Sessions

Master session management

Proxy Rotation

Rotate proxies automatically

Build docs developers (and LLMs) love