Scrapling provides powerful methods to extract data from HTML elements. Whether you need text content, HTML markup, attributes, or structured data, Scrapling has you covered.
For Selector (single element): Returns a single-element list containing the element’s serialized string.For Selectors (multiple elements): Serializes all elements and returns as a TextHandlers list.
# Single element returns list with one itemelement = page.css('h1').firstresult = element.getall()print(result) # ['<h1>Title</h1>']
Aliases for backward compatibility with other scraping libraries.
extract = getallextract_first = get
# These are equivalenttext1 = page.css('p').extract_first()text2 = page.css('p').get()# These are equivalenttexts1 = page.css('p').extract()texts2 = page.css('p').getall()
Returns the text content of the element. For text nodes, returns the text value. For HTML elements, returns the element’s direct text (not including children).
# Get text contenttitle = page.css('h1').firstif title: print(title.text) # Returns: "Title" (not <h1>Title</h1>)# Text is a TextHandler with useful methodsprice_text = page.css('.price').first.textprice_clean = price_text.strip().replace('$', '')price_value = float(price_clean)
Returns an AttributesHandler containing all attributes of the element.
# Get specific attributelink = page.css('a').firsthref = link.attrib.get('href')class_name = link.attrib.get('class')# Check if attribute existsif 'data-id' in link.attrib: data_id = link['data-id']
# Check if attribute existslink = page.css('a').firstif 'href' in link: print(f"Link points to: {link['href']}")if 'target' in link: print(f"Opens in: {link['target']}")else: print("Opens in same window")
# Get tag nameelement = page.css('.content').firstprint(element.tag) # e.g., "div"# For text nodestext_node = page.xpath('//p/text()').firstprint(text_node.tag) # "#text"
# Check for classelement = page.css('.container').firstif element.has_class('active'): print("Element is active")if element.has_class('container'): print("Element is a container")
# Extract data from HTML tabletable = page.css('table.data-table').firstif table: headers = [th.text.strip() for th in table.css('thead th')] rows = [] for tr in table.css('tbody tr'): row = {} cells = tr.css('td') for i, cell in enumerate(cells): header = headers[i] if i < len(headers) else f'column_{i}' row[header] = cell.text.strip() rows.append(row) print(f"Extracted {len(rows)} rows")