Overview
Scrapling is built on a modular, layered architecture that separates concerns between fetching, parsing, and data extraction. The framework is designed around three core pillars:Fetchers
Handle HTTP requests and browser automation
Parser
Process and navigate HTML/XML documents
Sessions
Manage persistent connections and state
Core Components
Fetcher Layer
The fetcher layer provides a unified interface for making web requests, abstracting away the differences between various HTTP clients and browser automation tools.Response object that inherits from the Selector class, providing immediate parsing capabilities.
Parser Layer
At the heart of Scrapling is theSelector class, built on top of lxml for high-performance HTML/XML parsing.
Key Features:
- CSS and XPath selector support
- Adaptive element relocation (survives page structure changes)
- Rich text extraction with
TextHandler - Attribute handling with
AttributesHandler - Tree navigation (parent, children, siblings)
Response System
TheResponse class extends Selector with HTTP-specific metadata:
Architecture Diagram
Request Flow
Core Inheritance
Engine Layer
Scrapling uses different engines depending on the fetcher type:Static Engine (curl_cffi)
Used byFetcher and AsyncFetcher for fast HTTP requests with browser impersonation:
- Browser impersonation (Chrome, Firefox, Safari, Edge)
- HTTP/2 and HTTP/3 support
- Automatic header generation
- Connection pooling with sessions
Browser Engine (Playwright)
Used byDynamicFetcher and StealthyFetcher for JavaScript-heavy sites:
- Real browser automation (Chromium-based)
- JavaScript execution
- Network interception and resource blocking
- Page pooling for sessions
- Stealth mode with anti-detection evasion
Data Flow
1. Request Initialization
2. Configuration Merging
The fetcher merges default settings with request-specific parameters:3. Engine Execution
The appropriate engine executes the request:4. Response Creation
Raw responses are converted to unifiedResponse objects:
5. Parsing Layer Access
TheResponse inherits all Selector parsing methods:
Design Principles
Lazy Imports
Scrapling uses lazy imports for faster startup times:Unified Response Interface
All fetchers return the sameResponse type, making it easy to switch between different fetching strategies:
Separation of Concerns
Each layer has a clear responsibility:- Fetchers: Network communication and browser control
- Engines: Low-level HTTP/browser implementation
- Parser: HTML/XML processing and navigation
- Sessions: State management and connection pooling
- Toolbelt: Shared utilities (proxy rotation, fingerprints, etc.)
Performance Optimizations
Cached Properties:Extension Points
Scrapling is designed to be extensible:Custom Storage System
Implement custom storage for adaptive element relocation:Custom Fetcher
ExtendBaseFetcher for custom fetching logic:
File Structure
Next Steps
Fetchers
Learn about different fetcher types
Parsing
Deep dive into the parsing system
Sessions
Understand session management