Skip to main content
The Request class represents a single request in Scrapling’s spider framework. It encapsulates the URL, callback, priority, metadata, and session parameters for fetching and processing web pages.

Class Definition

from scrapling.spiders import Request

class Request:
    """Represents a request to be processed by a Spider."""

Constructor

def __init__(
    self,
    url: str,
    sid: str = "",
    callback: Callable[[Response], AsyncGenerator] | None = None,
    priority: int = 0,
    dont_filter: bool = False,
    meta: dict[str, Any] | None = None,
    _retry_count: int = 0,
    **kwargs: Any
)
url
str
required
The URL to request.
sid
str
default:"\"\""
Session ID to use for this request. If empty, the spider’s default session is used.
callback
Callable | None
default:"None"
Async generator function to process the response. If None, the spider’s parse() method is used.
priority
int
default:"0"
Request priority. Higher values are processed first. Default is 0.
dont_filter
bool
default:"False"
If True, this request won’t be filtered by the duplicate filter, even if it’s already been seen.
meta
dict[str, Any] | None
default:"None"
Arbitrary metadata dictionary to pass along with the request. Merged with response.meta.
_retry_count
int
default:"0"
Internal retry counter (managed automatically by the engine).
kwargs
Any
Additional session-specific keyword arguments (e.g., headers, proxy, method, data, json). These are passed to the session’s fetch method.

Attributes

url
str
The request URL.
sid
str
Session ID for this request.
callback
Callable | None
Response processing callback.
priority
int
Request priority for scheduling.
dont_filter
bool
Whether to bypass duplicate filtering.
meta
dict[str, Any]
Metadata dictionary.
domain
str
Cached property that extracts the domain from the URL (e.g., “example.com”).

Methods

copy

def copy(self) -> Request
Create a copy of this request. Useful when retrying or modifying requests. Returns: A new Request instance with copied attributes Example:
original = Request("https://example.com", priority=5)
retry = original.copy()
retry.priority = 10  # Increase priority for retry

update_fingerprint

def update_fingerprint(
    self,
    include_kwargs: bool = False,
    include_headers: bool = False,
    keep_fragments: bool = False,
) -> bytes
Generate a unique fingerprint for deduplication. The fingerprint is cached in self._fp after first computation.
include_kwargs
bool
default:"False"
Include session kwargs (except data/json) in the fingerprint.
include_headers
bool
default:"False"
Include request headers in the fingerprint.
keep_fragments
bool
default:"False"
Keep URL fragments when canonicalizing the URL for fingerprinting.
Returns: SHA-1 hash bytes representing the unique fingerprint
The fingerprint is based on: URL (canonicalized), session ID, HTTP method, request body (data/json), and optionally headers and kwargs.

Special Methods

Comparison

Requests can be compared for priority-based sorting:
# Higher priority requests are "greater than" lower priority ones
req1 = Request("https://example.com", priority=5)
req2 = Request("https://example.com", priority=10)

req2 > req1  # True
req1 < req2  # True

Equality

Requests are equal if they have the same fingerprint:
req1 = Request("https://example.com")
req2 = Request("https://example.com")

# Must generate fingerprints first
req1.update_fingerprint()
req2.update_fingerprint()

req1 == req2  # True (same URL, same session, same method)
You must call update_fingerprint() before comparing requests with ==, otherwise a RuntimeError is raised.

String Representation

req = Request("https://example.com", priority=5, callback=spider.parse)

print(req)
# Output: https://example.com

print(repr(req))
# Output: <Request(https://example.com) priority=5 callback=parse>

Serialization

Requests support pickling for checkpoint/resume functionality. The callback is stored as a method name string and restored from the spider instance.
import pickle

req = Request("https://example.com", callback=spider.parse)
serialized = pickle.dumps(req)
restored = pickle.loads(serialized)

# Callback is None after unpickling
restored._restore_callback(spider)  # Restore from spider

Usage Examples

Basic Request

from scrapling.spiders import Request

# Simple GET request
request = Request("https://api.example.com/data")

POST Request with JSON

request = Request(
    "https://api.example.com/search",
    method="POST",
    json={"query": "scrapling", "limit": 10},
    headers={"Authorization": "Bearer token"}
)

Request with Custom Callback

class MySpider(Spider):
    async def parse(self, response):
        # Extract detail page links
        for link in response.css("a.detail::attr(href)").getall():
            yield Request(
                response.urljoin(link),
                callback=self.parse_detail,
                priority=10  # Higher priority for detail pages
            )
    
    async def parse_detail(self, response):
        yield {
            "title": response.css("h1::text").get(),
            "content": response.css(".content::text").get()
        }

Request with Metadata

async def parse(self, response):
    # Pass data between callbacks using meta
    category = response.css(".category::text").get()
    
    for product in response.css(".product"):
        link = product.css("a::attr(href)").get()
        yield Request(
            response.urljoin(link),
            callback=self.parse_product,
            meta={"category": category, "page": 1}
        )

async def parse_product(self, response):
    # Access metadata from response.meta
    yield {
        "name": response.css(".name::text").get(),
        "category": response.meta["category"],
        "page": response.meta["page"]
    }

Request with Different Session

class MySpider(Spider):
    def configure_sessions(self, manager):
        from scrapling.fetchers import FetcherSession, AsyncStealthySession
        
        manager.add("default", FetcherSession())
        manager.add("stealth", AsyncStealthySession())
    
    async def parse(self, response):
        # Use stealth session for sensitive pages
        yield Request(
            "https://example.com/protected",
            sid="stealth",
            callback=self.parse_protected
        )

Request with Proxy

from scrapling.engines.toolbelt import ProxyRotator

class MySpider(Spider):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.proxy_rotator = ProxyRotator([
            "http://proxy1:8080",
            "http://proxy2:8080"
        ])
    
    async def parse(self, response):
        yield Request(
            "https://example.com/data",
            proxy=self.proxy_rotator.get_proxy()
        )

Bypassing Duplicate Filter

async def parse(self, response):
    # Normal request - will be filtered if seen before
    yield Request("https://example.com/data")
    
    # Force re-fetch even if seen before
    yield Request(
        "https://example.com/data",
        dont_filter=True,
        meta={"reason": "forced_update"}
    )

Internal Attributes

_retry_count
int
Number of times this request has been retried (managed by CrawlerEngine).
_session_kwargs
dict
Dictionary of keyword arguments to pass to the session’s fetch method.
_fp
bytes | None
Cached fingerprint bytes. None until update_fingerprint() is called.
_callback_name
str | None
Temporary attribute used during pickling to store callback method name.

See Also

Build docs developers (and LLMs) love