Migrating from BeautifulSoup

If you’re already familiar with BeautifulSoup, you’re in for a treat. Scrapling is much faster, provides the same parsing capabilities as BS, adds additional parsing capabilities not found in BS, and introduces powerful new features for fetching and handling modern web pages. This guide will help you quickly adapt your existing BeautifulSoup code to leverage Scrapling’s capabilities. Below is a table that covers the most common operations you’ll perform when scraping web pages. Each row illustrates how to achieve a specific task using BeautifulSoup and the corresponding method in Scrapling. You will notice that some shortcuts in BeautifulSoup are missing in Scrapling, which is one of the reasons BeautifulSoup is slower than Scrapling. The point is: If the same feature can be used in a short one-liner, there is no need to sacrifice performance to shorten that short line.

Quick Reference Table

Task	BeautifulSoup Code	Scrapling Code
Parser import	`from bs4 import BeautifulSoup`	`from scrapling.parser import Selector`
Parsing HTML from string	`soup = BeautifulSoup(html, 'html.parser')`	`page = Selector(html)`
Finding a single element	`element = soup.find('div', class_='example')`	`element = page.find('div', class_='example')`
Finding multiple elements	`elements = soup.find_all('div', class_='example')`	`elements = page.find_all('div', class_='example')`
Finding with attributes dict	`element = soup.find('div', attrs={"class": "example"})`	`element = page.find('div', {"class": "example"})`
Finding with regex	`element = soup.find(re.compile("^b"))`	`element = page.find(re.compile(r"^b"))` `element = page.find_by_regex(r"^b")`
Finding with function	`element = soup.find(lambda e: len(list(e.children)) > 0)`	`element = page.find(lambda e: len(e.children) > 0)`
Finding multiple tags	`element = soup.find(["a", "b"])`	`element = page.find(["a", "b"])`
Find by text content	`element = soup.find(text="some text")`	`element = page.find_by_text("some text", partial=False)`
CSS selector (first)	`element = soup.select_one('div.example')`	`element = page.css('div.example').first`
CSS selector (all)	`elements = soup.select('div.example')`	`elements = page.css('div.example')`
Prettified HTML	`prettified = soup.prettify()`	`prettified = page.prettify()`
Raw HTML	`source = str(soup)`	`source = page.html_content`
Get tag name	`name = element.name`	`name = element.tag`
Get text content	`string = element.string`	`string = element.text`
Get all text	`text = soup.get_text(strip=True)`	`text = page.get_all_text(strip=True)`
Get attributes dict	`attrs = element.attrs`	`attrs = element.attrib`
Get attribute value	`attr = element['href']`	`attr = element['href']`
Navigate to parent	`parent = element.parent`	`parent = element.parent`
Get all parents	`parents = list(element.parents)`	`parents = list(element.iterancestors())`
Find parent by tag	`target_parent = element.find_parent("a")`	`target_parent = element.find_ancestor(lambda p: p.tag == 'a')`
Get siblings	N/A	`siblings = element.siblings`
Get next sibling	`next_element = element.next_sibling`	`next_element = element.next`
Find next sibling	`target = element.find_next_sibling("a")`	`target = element.siblings.search(lambda s: s.tag == 'a')`
Find all next siblings	`targets = element.find_next_siblings("a")`	`targets = element.siblings.filter(lambda s: s.tag == 'a')`
Get previous sibling	`prev = element.previous_sibling`	`prev = element.previous`
Navigate to children	`children = list(element.children)`	`children = element.children`
Get all descendants	`descendants = list(element.descendants)`	`descendants = element.below_elements`

Important: BS4’s find_previous/find_all_previous searches all preceding elements in document order, while Scrapling’s path only returns ancestors (the parent chain). These are not exact equivalents, but ancestor search covers the most common use case.

Key Differences

One key point to remember: BeautifulSoup offers features for modifying and manipulating the page after it has been parsed. Scrapling focuses more on scraping the page faster for you, and then you can do what you want with the extracted information. So, two different tools can be used in Web Scraping, but one of them specializes in Web Scraping.

Different Parsers

BeautifulSoup allows you to set the parser engine to use, and one of them is lxml. Scrapling doesn’t do that and uses the lxml library by default for performance reasons.

Element Types

In BeautifulSoup, elements are Tag objects; in Scrapling, they are Selector objects. However, they provide similar methods and properties for navigation and data extraction.

Error Handling

Both libraries return None when an element is not found (e.g., soup.find() or page.find()). In Scrapling, page.css() returns an empty Selectors list when no elements match, and you can use page.css('.foo').first to safely get the first match or None. To avoid errors, check for None or empty results before accessing properties.

Text Extraction

Scrapling provides additional methods for handling text through TextHandler, such as clean(), which can help remove extra whitespace, consecutive spaces, or unwanted characters.

Side-by-Side Examples

Example 1: Scraping All Links

Here’s a simple example of scraping a web page to extract all the links.

import requests
from bs4 import BeautifulSoup

url = 'https://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

links = soup.find_all('a')
for link in links:
    print(link['href'])

As you can see, Scrapling simplifies the process by combining fetching and parsing into a single step, making your code cleaner and more efficient.

Example 2: Extracting Product Information

import requests
from bs4 import BeautifulSoup

url = 'https://example.com/products'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

products = []
for product in soup.find_all('div', class_='product'):
    title = product.find('h2', class_='title').get_text(strip=True)
    price = product.find('span', class_='price').get_text(strip=True)
    products.append({'title': title, 'price': price})

for product in products:
    print(f"{product['title']}: {product['price']}")

Example 3: Advanced Text Processing

import requests
from bs4 import BeautifulSoup
import re

url = 'https://example.com/article'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Get all text and clean it manually
text = soup.get_text(strip=True)
# Remove extra whitespace
text = re.sub(r'\s+', ' ', text)
print(text)

Example 4: Finding Elements by Text

import requests
from bs4 import BeautifulSoup

url = 'https://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Find all elements containing specific text
elements = soup.find_all(text=lambda text: text and 'Search' in text)
for elem in elements:
    print(elem.parent.name, elem.strip())

Example 5: Working with Element Attributes

import requests
from bs4 import BeautifulSoup

url = 'https://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

for img in soup.find_all('img'):
    src = img.get('src')
    alt = img.get('alt', 'No alt text')
    if img.has_attr('class'):
        classes = ' '.join(img['class'])
        print(f"{src} - {alt} - {classes}")
    else:
        print(f"{src} - {alt}")

Performance Benefits

Scrapling significantly outperforms BeautifulSoup in text extraction and parsing tasks:

Library	Time (ms)	vs Scrapling
Scrapling	2.02	1.0x
BS4 with Lxml	1584.31	~784.3x slower
BS4 with html5lib	3391.91	~1679.1x slower

Additional Features in Scrapling

Beyond the familiar BeautifulSoup API, Scrapling offers:

Adaptive Scraping: Elements can be automatically relocated when website structures change
Advanced Fetchers: Built-in HTTP clients with browser impersonation, stealth mode, and Cloudflare bypass
Spider Framework: Build full-scale concurrent crawlers with pause/resume capabilities
CSS Pseudo-elements: Extract text and attributes directly with ::text and ::attr(name)
Enhanced Navigation: More powerful DOM traversal with methods like find_similar() and below_elements
Better Performance: Optimized for speed while maintaining a familiar API

Next Steps

The documentation provides more details on Scrapling’s features and the complete list of arguments that can be passed to all methods. This guide should make your transition from BeautifulSoup to Scrapling smooth and straightforward. Happy scraping!

Getting Started

Core Concepts

Fetching

Parsing & Selection

Spiders

CLI & Tools

AI Integration

Guides

Tutorials

Migrating from BeautifulSoup

Quick Reference Table

Key Differences

Different Parsers

Element Types

Error Handling

Text Extraction

Side-by-Side Examples

Example 1: Scraping All Links

Example 2: Extracting Product Information

Example 3: Advanced Text Processing

Example 4: Finding Elements by Text

Example 5: Working with Element Attributes

Performance Benefits

Additional Features in Scrapling

Next Steps

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Fetching

Parsing & Selection

Spiders

CLI & Tools

AI Integration

Guides

Tutorials

Documentation Index

​Quick Reference Table

​Key Differences

​Different Parsers

​Element Types

​Error Handling

​Text Extraction

​Side-by-Side Examples

​Example 1: Scraping All Links

​Example 2: Extracting Product Information

​Example 3: Advanced Text Processing

​Example 4: Finding Elements by Text

​Example 5: Working with Element Attributes

​Performance Benefits

​Additional Features in Scrapling

​Next Steps

Build docs developers (and LLMs) love

Quick Reference Table

Key Differences

Different Parsers

Element Types

Error Handling

Text Extraction

Side-by-Side Examples

Example 1: Scraping All Links

Example 2: Extracting Product Information

Example 3: Advanced Text Processing

Example 4: Finding Elements by Text

Example 5: Working with Element Attributes

Performance Benefits

Additional Features in Scrapling

Next Steps