The inspect command fetches and analyzes a website to help you understand its structure and build scraper configurations. It supports three modes: lightweight HTTP, browser-based (for JavaScript sites), and Cloudflare bypass.
🌐 Using browser for JS-rendered contentLaunching browser...Navigating to: https://spa-site.comWaiting for page load...Content rendered!Analysis saved to: data/myproject/inspect/spa_site_com/ • page.html (rendered HTML after JS execution) • screenshot.png (page screenshot) • metadata.json
Browser mode waits for JavaScript to execute and renders the final DOM. This is the HTML you should analyze for extraction selectors.
xvfb-run -a ./scrapai inspect https://protected-site.com --cloudflare
Or install xvfb:
sudo apt-get install xvfb
macOS/Windows:
🖥️ Display available - using native browser for Cloudflare bypassSolving Cloudflare challenge...Challenge solved!Extracting cookies...Analysis saved to: data/myproject/inspect/protected_site_com/ • page.html (final HTML after bypass) • cookies.json (Cloudflare session cookies) • metadata.json
Cannot use both --browser and --cloudflare flags together. Choose one based on your needs.
# Analyze HTML structure./scrapai analyze data/news/inspect/example_com/page.html# Test a CSS selector./scrapai analyze page.html --test "article.post h1.title"# Find elements with keyword./scrapai analyze page.html --find "author"
$ ./scrapai analyze page.html📄 Analyzing: page.html📊 HTML size: 45231 bytes💡 TIP: Use --find 'keyword' to search for specific elements============================================================🏷️ HEADERS (h1, h2)============================================================H1 - Found 1: [1] h1.article-headline Text: UK economy grows 0.4% in FebruaryH2 - Found 5: [1] h2.section-title Text: Economic Growth [2] h2.section-title Text: Market Response ...============================================================📝 CONTENT CONTAINERS============================================================ [1] article.main-article Size: 3,245 chars Preview: The UK economy grew by 0.4% in February, official figures show... [2] div.article-body Size: 2,891 chars Preview: Economists had expected growth of 0.2%, making this a positive...============================================================📅 DATES============================================================ time.published-date: February 28, 2026 span.updated-time: Updated 2 hours ago============================================================✍️ AUTHORS============================================================ span.author-name: Economics Reporter a.byline: By John Smith============================================================
$ ./scrapai analyze page.html --test "h1.article-headline::text"🔍 Testing selector: h1.article-headline::text============================================================✓ Found 1 element(s)[1] h1 Classes: ['article-headline'] Text (62 chars): UK economy grows 0.4% in February
Use ::text pseudo-selector to extract text content instead of HTML.
$ ./scrapai analyze page.html --find "author"🔎 Finding elements with keyword: 'author'============================================================ span.author-name Text: Economics Reporter div.author-bio Text: Economics Reporter specializes in UK economic policy and analysis. a.author-profile Text: View profile✓ Found 3 elements
Use extract-urls after inspect to analyze URL patterns and design spider rules. Look for common patterns like /articles/[year]/[month]/[slug] to write effective regex rules.