Use this file to discover all available pages before exploring further.
Extracting data from web pages requires navigating the DOM, executing JavaScript, and handling dynamic content. This recipe shows how to use bdg to extract structured data efficiently.
The accessibility tree provides semantic structure without HTML noise.
# Query for product elements by rolebdg dom a11y query role=article# Get semantic representation (70-99% token reduction)bdg dom get ".product-card" --json
This is more token-efficient than parsing full HTML.
4
Handle pagination to extract all pages
Navigate through paginated results to collect complete dataset.
# Extract first pagebdg dom eval 'Array.from(document.querySelectorAll(".product-card")).map(c => ({title: c.querySelector(".product-title").textContent}))' --json > page1.json# Click "Next" buttonbdg dom click ".pagination-next"sleep 2# Extract second pagebdg dom eval 'Array.from(document.querySelectorAll(".product-card")).map(c => ({title: c.querySelector(".product-title").textContent}))' --json > page2.json# Merge resultsjq -s 'add' page1.json page2.json > all-products.json
5
Scroll to load lazy-loaded content
Some sites load content only when scrolled into view.
# Scroll to bottom to trigger lazy loadingbdg dom scroll --bottom# Wait for content to loadsleep 2# Check if more products appearedbdg dom query ".product-card" --json | jq '.data.nodes | length'# Extract all (including lazy-loaded)bdg dom eval 'document.querySelectorAll(".product-card").length'
6
Extract data from specific product page
Navigate to a product detail page and extract comprehensive data.
# Navigate to product page via JavaScriptbdg dom eval 'window.location.href = "https://example.com/products/123"'# Or click a product linkbdg dom click ".product-card:first-child a"# Wait for navigationsleep 3# Extract detailed product databdg dom eval '({ title: document.querySelector(".product-title")?.textContent, price: document.querySelector(".price")?.textContent, description: document.querySelector(".description")?.textContent, images: Array.from(document.querySelectorAll(".product-image img")).map(img => img.src), specifications: Array.from(document.querySelectorAll(".specs tr")).map(row => ({ name: row.querySelector("th")?.textContent, value: row.querySelector("td")?.textContent })), reviews: Array.from(document.querySelectorAll(".review")).map(review => ({ author: review.querySelector(".author")?.textContent, rating: review.querySelector(".rating")?.getAttribute("aria-label"), text: review.querySelector(".review-text")?.textContent }))})' --json > product-123.json
7
Handle dynamic content (AJAX-loaded)
Wait for AJAX content to load before extraction.
# Trigger action that loads contentbdg dom click "#load-reviews"# Wait for network activity to completesleep 2# Verify content loaded by checking networkbdg network list --filter "domain:api.example.com" --last 1# Extract newly loaded contentbdg dom query ".review" --json | jq '.data.nodes | length'
8
Extract tabular data
For data in HTML tables, extract as structured JSON.
Solution: Scroll incrementally and check for new content.
# Scroll down in chunksfor i in {1..10}; do bdg dom scroll --down 500 sleep 1 COUNT=$(bdg dom query ".product-card" --json | jq '.data.nodes | length') echo "Items loaded: $COUNT"done
Dynamic class names (React/CSS modules)
Solution: Use stable attributes like data-testid or semantic roles.
# Instead of brittle class names# ❌ bdg dom query ".css-1x2y3z4"# Use data attributesbdg dom query "[data-testid='product-card']"# Or accessibility rolesbdg dom a11y query role=article
Content in iframes
Solution: Use CDP to switch to iframe context.
# List all framesbdg cdp Page.getFrameTree --json# Execute in specific frame (requires CDP scripting)# Note: Direct iframe support is limited; consider extracting iframe src and navigating to itbdg dom eval 'document.querySelector("iframe").src'
JavaScript-rendered content with delay
Solution: Poll for element existence before extraction.
# Wait for element to appearfor i in {1..10}; do EXISTS=$(bdg dom query ".dynamic-content" --json | jq '.data.nodes | length') if [ $EXISTS -gt 0 ]; then echo "Content loaded" break fi sleep 1done# Extract contentbdg dom eval 'document.querySelector(".dynamic-content").textContent'
# Extract and save page by pagefor page in {1..10}; do bdg dom eval "/* extraction script */" --json > "page-$page.json" bdg dom click ".next-page" sleep 2done