Selectors

Overview

Scrapling provides multiple selector methods to find elements in HTML documents. You can use CSS3 selectors, XPath expressions, or search by text content.

CSS Selectors

Search the DOM tree using CSS3 selectors.

from scrapling import Fetcher

page = Fetcher.fetch('https://example.com')

# Find elements with CSS
links = page.css('a.nav-link')
headers = page.css('h1, h2, h3')

Method Signature

def css(
    selector: str,
    identifier: str = "",
    adaptive: bool = False,
    auto_save: bool = False,
    percentage: int = 0,
) -> Selectors

selector

str

required

The CSS3 selector to be used

identifier

str

default:""

A string that will be used to save/retrieve element’s data in adaptive mode. If not provided, the selector will be used as identifier.

It’s recommended to use the identifier argument if you plan to use a different selector later and want to relocate the same element(s)

adaptive

bool

default:"false"

When enabled, the function will try to relocate the element if it was saved before

auto_save

bool

default:"false"

Automatically save new elements for adaptive mode later

percentage

int

default:"0"

The minimum percentage to accept while adaptive is working. The percentage calculation depends on the page structure.

Examples

# Select by class
products = page.css('.product-card')

# Select by ID
header = page.css('#main-header')

# Complex selectors
active_links = page.css('nav a.active[href*="/products/"]')

XPath Selectors

Search the DOM tree using XPath expressions. XPath provides more powerful querying capabilities than CSS.

Method Signature

def xpath(
    selector: str,
    identifier: str = "",
    adaptive: bool = False,
    auto_save: bool = False,
    percentage: int = 0,
    **kwargs: Any,
) -> Selectors

selector

str

required

The XPath selector to be used

identifier

str

default:""

A string that will be used to save/retrieve element’s data in adaptive mode. If not provided, the selector will be used as identifier.

adaptive

bool

default:"false"

When enabled, the function will try to relocate the element if it was saved before

auto_save

bool

default:"false"

Automatically save new elements for adaptive mode later

percentage

int

default:"0"

The minimum percentage to accept while adaptive is working

**kwargs

Any

Additional keyword arguments will be passed as XPath variables in the XPath expression

Examples

# Find all links
links = page.xpath('//a')

# Find elements by attribute
products = page.xpath('//div[@class="product"]')

# Complex XPath
titles = page.xpath('//article//h2[contains(@class, "title")]')

Find Methods

Find elements using flexible filters including tag names, attributes, regex patterns, and custom functions.

find_all()

Find all elements matching the specified criteria.

def find_all(
    *args: str | Iterable[str] | Pattern | Callable | Dict[str, str],
    **kwargs: str,
) -> Selectors

args

str | Iterable[str] | Pattern | Callable | Dict[str, str]

Tag name(s) as strings
Iterable of tag names
Regex patterns to match against text
Callable function that takes a Selector and returns bool
Dictionary of attribute name-value pairs

kwargs

str

Attribute names and their values to filter elements. Use class_ for the class attribute and for_ for the for attribute.

# Find all div elements
divs = page.find_all('div')

# Find multiple tag types
headings = page.find_all('h1', 'h2', 'h3')

# Using iterable
tags = ['article', 'section']
elements = page.find_all(tags)

find()

Find the first element matching the criteria, or return None.

def find(
    *args: str | Iterable[str] | Pattern | Callable | Dict[str, str],
    **kwargs: str,
) -> Optional[Selector]

Accepts the same parameters as find_all() but returns only the first match.

# Find first matching element
header = page.find('header', class_="main")

if header:
    print(header.text)
else:
    print("Header not found")

Text-Based Search

Find elements by their text content.

find_by_text()

Find elements with matching text content.

def find_by_text(
    text: str,
    first_match: bool = True,
    partial: bool = False,
    case_sensitive: bool = False,
    clean_match: bool = True,
) -> Selector | Selectors

text

str

required

Text query to match

first_match

bool

default:"true"

Returns the first element that matches conditions

partial

bool

default:"false"

If enabled, returns elements that contain the input text

case_sensitive

bool

default:"false"

If enabled, letter case will be taken into consideration

clean_match

bool

default:"true"

If enabled, ignores all whitespaces and consecutive spaces while matching

# Find element with exact text
button = page.find_by_text('Submit', first_match=True)

find_by_regex()

Find elements whose text content matches a regex pattern.

def find_by_regex(
    query: str | Pattern[str],
    first_match: bool = True,
    case_sensitive: bool = False,
    clean_match: bool = True,
) -> Selector | Selectors

query

str | Pattern[str]

required

Regex query/pattern to match

first_match

bool

default:"true"

Return the first element that matches conditions

case_sensitive

bool

default:"false"

If enabled, letter case will be taken into consideration

clean_match

bool

default:"true"

If enabled, ignores all whitespaces and consecutive spaces while matching

import re

# Find prices
price = page.find_by_regex(r'\$\d+\.\d{2}')

# Find all email addresses
emails = page.find_by_regex(
    r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',
    first_match=False
)

# Case-sensitive pattern
code = page.find_by_regex(r'[A-Z]{3}-\d{4}', case_sensitive=True)

Advanced: Find Similar Elements

Find elements that are similar to the current element based on structure and attributes.

def find_similar(
    similarity_threshold: float = 0.2,
    ignore_attributes: List | Tuple = ("href", "src"),
    match_text: bool = False,
) -> Selectors

similarity_threshold

float

default:"0.2"

The percentage threshold for attribute matching. Elements are pre-filtered by same depth, tag name, and parent structure before attribute comparison.

ignore_attributes

List | Tuple

default:"['href', 'src']"

Attribute names to ignore while matching. URLs are ignored by default as they often differ between similar elements.

match_text

bool

default:"false"

If True, element text content will be included in similarity calculation

This function is inspired by AutoScraper and is useful for finding repeated patterns like product cards in a list.

# Find one product card
first_product = page.css('.product').first

# Find all similar product cards
all_products = first_product.find_similar(similarity_threshold=0.3)

for product in all_products:
    print(product.css('.title').text)

Selectors vs Selector

Selector: Represents a single element
Selectors: A list-like container of multiple Selector objects

Both classes have similar methods, with Selectors applying operations across all contained elements:

# Single element (Selector)
element = page.css('.container').first
text = element.text  # TextHandler

# Multiple elements (Selectors)
elements = page.css('.item')
texts = elements.getall()  # List of TextHandler

Getting Started

Core Concepts

Fetching

Parsing & Selection

Spiders

CLI & Tools

AI Integration

Guides

Tutorials

Overview

CSS Selectors

Method Signature

Examples

XPath Selectors

Method Signature

Examples

Find Methods

find_all()

find()

Text-Based Search

find_by_text()

find_by_regex()

Advanced: Find Similar Elements

Selectors vs Selector

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Fetching

Parsing & Selection

Spiders

CLI & Tools

AI Integration

Guides

Tutorials

Documentation Index

​Overview

​CSS Selectors

​Method Signature

​Examples

​XPath Selectors

​Method Signature

​Examples

​Find Methods

​find_all()

​find()

​Text-Based Search

​find_by_text()

​find_by_regex()

​Advanced: Find Similar Elements

​Selectors vs Selector

Build docs developers (and LLMs) love

Overview

CSS Selectors

Method Signature

Examples

XPath Selectors

Method Signature

Examples

Find Methods

find_all()

find()

Text-Based Search

find_by_text()

find_by_regex()

Advanced: Find Similar Elements

Selectors vs Selector