Overview
Scrapling provides multiple selector methods to find elements in HTML documents. You can use CSS3 selectors, XPath expressions, or search by text content.CSS Selectors
Search the DOM tree using CSS3 selectors.Method Signature
The CSS3 selector to be used
A string that will be used to save/retrieve element’s data in adaptive mode. If not provided, the selector will be used as identifier.
When enabled, the function will try to relocate the element if it was saved before
Automatically save new elements for adaptive mode later
The minimum percentage to accept while adaptive is working. The percentage calculation depends on the page structure.
Examples
XPath Selectors
Search the DOM tree using XPath expressions. XPath provides more powerful querying capabilities than CSS.Method Signature
The XPath selector to be used
A string that will be used to save/retrieve element’s data in adaptive mode. If not provided, the selector will be used as identifier.
When enabled, the function will try to relocate the element if it was saved before
Automatically save new elements for adaptive mode later
The minimum percentage to accept while adaptive is working
Additional keyword arguments will be passed as XPath variables in the XPath expression
Examples
Find Methods
Find elements using flexible filters including tag names, attributes, regex patterns, and custom functions.find_all()
Find all elements matching the specified criteria.- Tag name(s) as strings
- Iterable of tag names
- Regex patterns to match against text
- Callable function that takes a Selector and returns bool
- Dictionary of attribute name-value pairs
Attribute names and their values to filter elements. Use
class_ for the class attribute and for_ for the for attribute.find()
Find the first element matching the criteria, or returnNone.
find_all() but returns only the first match.
Text-Based Search
Find elements by their text content.find_by_text()
Find elements with matching text content.Text query to match
Returns the first element that matches conditions
If enabled, returns elements that contain the input text
If enabled, letter case will be taken into consideration
If enabled, ignores all whitespaces and consecutive spaces while matching
find_by_regex()
Find elements whose text content matches a regex pattern.Regex query/pattern to match
Return the first element that matches conditions
If enabled, letter case will be taken into consideration
If enabled, ignores all whitespaces and consecutive spaces while matching
Advanced: Find Similar Elements
Find elements that are similar to the current element based on structure and attributes.The percentage threshold for attribute matching. Elements are pre-filtered by same depth, tag name, and parent structure before attribute comparison.
Attribute names to ignore while matching. URLs are ignored by default as they often differ between similar elements.
If True, element text content will be included in similarity calculation
This function is inspired by AutoScraper and is useful for finding repeated patterns like product cards in a list.
Selectors vs Selector
- Selector: Represents a single element
- Selectors: A list-like container of multiple Selector objects
Selectors applying operations across all contained elements: