Bypassing Cloudflare Turnstile

Scrapling can automatically solve Cloudflare’s Turnstile challenges, including the “Just a moment…” interstitial page and interactive captchas.

Quick Start

Enable Cloudflare solving with a single parameter:

from scrapling import StealthyFetcher

response = StealthyFetcher.fetch(
    'https://cloudflare-protected-site.com',
    solve_cloudflare=True
)

print(response.status)  # 200 - Challenge solved!
print(response.text)    # Actual page content

Cloudflare solving requires at least 60 seconds timeout. Scrapling automatically adjusts the timeout if you enable solve_cloudflare=True.

Source: scrapling/engines/_browsers/_validators.py:131-133

Challenge Types

Scrapling detects and solves three types of Cloudflare challenges:

Non-Interactive Challenge

The “Just a moment…” page that solves automatically:

response = StealthyFetcher.fetch(
    'https://example.com',
    solve_cloudflare=True
)

Detection logic: scrapling/engines/_browsers/_base.py:520-533 Implementation:

# Scrapling waits for the challenge to disappear
while "<title>Just a moment...</title>" in page_content:
    page.wait_for_timeout(1000)
    page.wait_for_load_state()

Source: scrapling/engines/_browsers/_stealth.py:124-130

Managed Challenge

Interactive checkbox challenge:

response = StealthyFetcher.fetch(
    'https://example.com',
    solve_cloudflare=True,
    headless=True  # Works in headless mode!
)

Scrapling:

Detects the challenge type from page content
Locates the checkbox iframe
Calculates precise click coordinates
Clicks with human-like delay (100-200ms)
Waits for network to settle

Source: scrapling/engines/_browsers/_stealth.py:132-186

Interactive Challenge

More complex interactive challenges are handled the same way:

response = StealthyFetcher.fetch(
    'https://example.com',
    solve_cloudflare=True
)

Embedded Turnstile

Turnstile widgets embedded directly in pages:

# Detects embedded turnstile from script tags
if selector.css('script[src*="challenges.cloudflare.com/turnstile/v"]'):
    challenge_type = "embedded"

Source: scrapling/engines/_browsers/_base.py:530-532

How It Works

Challenge Detection

Scrapling detects challenges by analyzing page content:

def _detect_cloudflare(page_content: str) -> str | None:
    """Detect Cloudflare challenge type"""
    challenge_types = (
        "non-interactive",
        "managed",
        "interactive",
    )
    for ctype in challenge_types:
        if f"cType: '{ctype}'" in page_content:
            return ctype
    
    # Check for embedded turnstile
    if 'challenges.cloudflare.com/turnstile' in page_content:
        return "embedded"
    
    return None

Source: scrapling/engines/_browsers/_base.py:502-534

Solving Process

Wait for page stability - Ensure challenge is fully loaded
Detect challenge type - Identify which Cloudflare challenge is present
Locate challenge iframe - Find the Turnstile iframe using regex pattern
Calculate click coordinates - Precise positioning with random offset
Human-like interaction - Click with realistic delay
Wait for resolution - Monitor page for challenge completion
Retry if needed - Recursive solving for stubborn challenges

Main solver: scrapling/engines/_browsers/_stealth.py:111-186

Click Coordinate Calculation

# Find the challenge iframe
iframe = page.frame(url=re.compile(
    r"^https?://challenges\.cloudflare\.com/cdn-cgi/challenge-platform/.*"
))

# Get bounding box
outer_box = iframe.frame_element().bounding_box()

# Calculate click position with random offset (26-28, 25-27)
captcha_x = outer_box["x"] + randint(26, 28)
captcha_y = outer_box["y"] + randint(25, 27)

# Click with human-like delay
page.mouse.click(captcha_x, captcha_y, delay=randint(100, 200))

Source: scrapling/engines/_browsers/_stealth.py:159-163

Usage Patterns

One-off Requests

from scrapling import StealthyFetcher

response = StealthyFetcher.fetch(
    'https://cloudflare-site.com',
    solve_cloudflare=True,
    timeout=60000  # At least 60 seconds
)

Session-based Scraping

from scrapling import StealthySession

with StealthySession(solve_cloudflare=True) as session:
    # First request solves challenge and gets cookies
    response1 = session.fetch('https://example.com/page1')
    
    # Subsequent requests use same cookies (no re-solving)
    response2 = session.fetch('https://example.com/page2')
    response3 = session.fetch('https://example.com/page3')

Spider Integration

from scrapling import Spider, StealthySession

class CloudflareSpider(Spider):
    name = 'cf_spider'
    start_urls = ['https://cloudflare-protected.com']
    
    def configure_sessions(self, manager):
        manager.add('default', StealthySession(
            solve_cloudflare=True,
            timeout=60000,
            headless=True
        ))
    
    async def parse(self, response):
        # Challenge already solved!
        yield {'title': response.css('title::text').get()}
        
        # Follow links - cookies preserved
        for link in response.css('a::attr(href)').getall():
            yield response.follow(link, callback=self.parse_item)

Async Usage

from scrapling import AsyncStealthySession
import asyncio

async def scrape():
    async with AsyncStealthySession(solve_cloudflare=True) as session:
        response = await session.fetch('https://example.com')
        return response.text

result = asyncio.run(scrape())

Advanced Configuration

Custom Timeout

Some challenges take longer to solve:

response = StealthyFetcher.fetch(
    'https://example.com',
    solve_cloudflare=True,
    timeout=90000  # 90 seconds for slower challenges
)

With Proxy Rotation

from scrapling import StealthySession, ProxyRotator

rotator = ProxyRotator([
    'http://proxy1:8080',
    'http://proxy2:8080',
])

with StealthySession(
    solve_cloudflare=True,
    proxy_rotator=rotator
) as session:
    response = session.fetch('https://example.com')

Page Actions After Solving

Perform actions after the challenge is solved:

def after_solve(page):
    # Cloudflare is already solved at this point
    page.click('button#load-more')
    page.wait_for_timeout(2000)

response = StealthyFetcher.fetch(
    'https://example.com',
    solve_cloudflare=True,
    page_action=after_solve
)

Note: page_action runs after Cloudflare solving completes. Source: scrapling/engines/_browsers/_stealth.py:243-252

Wait for Specific Content

Combine with selectors to wait for content after solving:

response = StealthyFetcher.fetch(
    'https://example.com',
    solve_cloudflare=True,
    wait_selector='div.content',
    wait_selector_state='visible'
)

Troubleshooting

Challenge Not Detected

Scrapling logs challenge detection:

# Check logs for:
# "No Cloudflare challenge found."
# "The turnstile version discovered is 'managed'"

If no challenge is found, the page may not be protected or uses a different system.

Timeout Errors

Increase timeout for slow challenges:

response = StealthyFetcher.fetch(
    'https://example.com',
    solve_cloudflare=True,
    timeout=120000  # 2 minutes
)

Challenge Still Present

The solver retries recursively:

# After 10 seconds, if challenge persists:
if "<title>Just a moment...</title>" in page_content:
    log.info("Cloudflare captcha is still present, solving again")
    return self._cloudflare_solver(page)  # Recursive retry

Source: scrapling/engines/_browsers/_stealth.py:184-186

Multiple Challenges

Some sites show multiple challenges. Use sessions to maintain cookies:

with StealthySession(solve_cloudflare=True) as session:
    # Only first request solves challenge
    for url in urls:
        response = session.fetch(url)

Limitations

CAPTCHA challenges - Image-based CAPTCHAs require manual solving or third-party services
Rate limiting - Solving challenges too frequently may trigger additional protections
WAF rules - Some sites use custom WAF rules beyond Cloudflare’s standard challenges

Best Practices

Use sessions - Reuse cookies across requests to avoid re-solving
Set adequate timeout - Minimum 60 seconds, 90-120 seconds recommended
Monitor logs - Check for challenge detection and solving status
Combine with other features - Use with hide_canvas, block_webrtc, etc.
Respect rate limits - Add delays between requests

# Recommended configuration
with StealthySession(
    solve_cloudflare=True,
    timeout=90000,
    hide_canvas=True,
    block_webrtc=True,
    google_search=True,
    disable_resources=True
) as session:
    for url in urls:
        response = session.fetch(url)
        # Process response
        await asyncio.sleep(2)  # Respect rate limits

Anti-Bot Bypass

General anti-bot bypass strategies

Handling Blocked Requests

Detect and retry blocked requests

Error Handling

Handle errors and timeouts

Performance Tips

Optimize scraping performance

Getting Started

Core Concepts

Fetching

Parsing & Selection

Spiders

CLI & Tools

AI Integration

Guides

Tutorials

Bypassing Cloudflare Turnstile

Quick Start

Challenge Types

Non-Interactive Challenge

Managed Challenge

Interactive Challenge

Embedded Turnstile

How It Works

Challenge Detection

Solving Process

Click Coordinate Calculation

Usage Patterns

One-off Requests

Session-based Scraping

Spider Integration

Async Usage

Advanced Configuration

Custom Timeout

With Proxy Rotation

Page Actions After Solving

Wait for Specific Content

Troubleshooting

Limitations

Best Practices

Anti-Bot Bypass

Handling Blocked Requests

Error Handling

Performance Tips

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Fetching

Parsing & Selection

Spiders

CLI & Tools

AI Integration

Guides

Tutorials

Documentation Index

​Quick Start

​Challenge Types

​Non-Interactive Challenge

​Managed Challenge

​Interactive Challenge

​Embedded Turnstile

​How It Works

​Challenge Detection

​Solving Process

​Click Coordinate Calculation

​Usage Patterns

​One-off Requests

​Session-based Scraping

​Spider Integration

​Async Usage

​Advanced Configuration

​Custom Timeout

​With Proxy Rotation

​Page Actions After Solving

​Wait for Specific Content

​Troubleshooting

​Limitations

​Best Practices

​Related Documentation

Anti-Bot Bypass

Handling Blocked Requests

Error Handling

Performance Tips

Build docs developers (and LLMs) love

Quick Start

Challenge Types

Non-Interactive Challenge

Managed Challenge

Interactive Challenge

Embedded Turnstile

How It Works

Challenge Detection

Solving Process

Click Coordinate Calculation

Usage Patterns

One-off Requests

Session-based Scraping

Spider Integration

Async Usage

Advanced Configuration

Custom Timeout

With Proxy Rotation

Page Actions After Solving

Wait for Specific Content

Troubleshooting

Limitations

Best Practices

Related Documentation