Core Concepts Comparison
| Scrapy Concept | Scrapling Equivalent | Notes |
|---|---|---|
scrapy.Spider | scrapling.spiders.Spider | Similar base class with name, start_urls |
scrapy.Request | scrapling.spiders.Request | Similar API, but simpler |
scrapy.Response | scrapling.engines.Response | Extends Selector with additional methods |
parse() method | parse() method | Must be async generator in Scrapling |
yield Request | yield Request | Same pattern |
yield item | yield dict | Just yield dictionaries |
| Item classes | Python dicts | No need for Item classes |
| Item Pipelines | on_scraped_item() hook | Simpler approach |
| Middlewares | Session configuration | Different architecture |
scrapy crawl | spider.start() | Programmatic approach |
| Settings | Class attributes | Direct configuration |
Spider Structure Comparison
Basic Spider
parse()must be anasyncgenerator in Scrapling- Type hint
Responsefor better IDE support - Must specify
callback=self.parseexplicitly in follow requests
Running the Spider
Advanced Features Comparison
Multiple Callbacks
Request Metadata
Concurrency Control
Allowed Domains
Item Processing
Item Pipelines vs Hooks
Session Management (Middlewares Alternative)
Using Different Session Types
Scrapy uses middlewares for request/response processing. Scrapling uses a session-based architecture:Proxy Configuration
Pause & Resume
Scrapy requires jobs directory configuration and command-line management. Scrapling makes it simple:Lifecycle Hooks
Logging
Selector Syntax
Good news! Scrapling uses the same selector syntax as Scrapy:Streaming Results
Scrapy doesn’t have built-in streaming. Scrapling does:Scrapling Only
Complete Migration Example
Here’s a complete Scrapy spider migrated to Scrapling:Key Advantages of Scrapling
- Modern Async/Await: Native async/await instead of Twisted deferreds
- Simpler Architecture: No need for separate settings.py, items.py, pipelines.py
- Built-in Sessions: Multiple fetcher types (HTTP, browser, stealth) in one spider
- Easy Pause/Resume: Just pass
crawldirparameter - Real-time Streaming: Stream items as they’re scraped with
spider.stream() - Better Performance: Optimized parsing that’s faster than Scrapy’s Parsel
- Type Hints: Full type coverage for better IDE support
- Simpler API: Less boilerplate, more Pythonic
What Scrapling Doesn’t Have
- No built-in commands system (like
scrapy genspider) - No extensions system (use Python decorators/inheritance)
- No contracts for testing (use standard Python testing)
- Simpler than Scrapy’s full framework approach