Documentation Index Fetch the complete documentation index at: https://mintlify.com/D4Vinci/Scrapling/llms.txt
Use this file to discover all available pages before exploring further.
Scrapling’s extract commands allow you to fetch web pages and save content in various formats without writing code.
HTTP Methods
GET Request
Perform a GET request and save content:
scrapling extract get https://example.com output.html
Options
--headers, -H - HTTP headers (repeatable)
--cookies - Cookie string
--timeout - Timeout in seconds (default: 30)
--proxy - Proxy URL
--css-selector, -s - CSS selector for specific content
--params, -p - Query parameters (repeatable)
--follow-redirects / --no-follow-redirects - Follow redirects (default: true)
--verify / --no-verify - Verify SSL certificates (default: true)
--impersonate - Browser to impersonate
--stealthy-headers / --no-stealthy-headers - Use browser-like headers (default: true)
Examples
# Save as HTML
scrapling extract get https://example.com page.html
# Save as Markdown
scrapling extract get https://example.com page.md
# Save as plain text
scrapling extract get https://example.com page.txt
# With custom headers
scrapling extract get https://api.example.com/data output.json \
-H "Authorization: Bearer token123" \
-H "Accept: application/json"
# Extract specific element
scrapling extract get https://example.com content.md \
--css-selector "article.main-content"
# With proxy
scrapling extract get https://example.com page.html \
--proxy "http://user:pass@proxy.example.com:8080"
# Impersonate browsers (single or random)
scrapling extract get https://example.com page.html --impersonate chrome
scrapling extract get https://example.com page.html --impersonate "chrome,firefox,safari"
POST Request
Perform a POST request with form data or JSON:
scrapling extract post https://example.com/api output.json --json '{"key": "value"}'
Additional Options
--data, -d - Form data (“param1=value1¶m2=value2”)
--json, -j - JSON data as string
Examples
# POST with JSON
scrapling extract post https://api.example.com/endpoint response.json \
--json '{"name": "John", "age": 30}'
# POST with form data
scrapling extract post https://example.com/form result.html \
--data "username=user&password=pass"
# POST with headers and cookies
scrapling extract post https://api.example.com/data output.json \
--json '{"query": "search term"}' \
-H "Content-Type: application/json" \
--cookies "session=abc123; token=xyz789"
PUT Request
Update resources with PUT:
scrapling extract put https://api.example.com/resource/1 response.json \
--json '{"status": "updated"}'
DELETE Request
Delete resources:
scrapling extract delete https://api.example.com/resource/1 response.html
Browser Automation
Dynamic Fetcher
Use Playwright for JavaScript-heavy sites:
scrapling extract fetch https://example.com output.html
Options
--headless / --no-headless - Headless mode (default: true)
--disable-resources / --enable-resources - Block unnecessary resources (default: false)
--network-idle / --no-network-idle - Wait for network idle (default: false)
--timeout - Timeout in milliseconds (default: 30000)
--wait - Additional wait time in milliseconds (default: 0)
--css-selector, -s - Extract specific content
--wait-selector - CSS selector to wait for
--locale - Browser locale
--real-chrome / --no-real-chrome - Use real Chrome installation (default: false)
--proxy - Proxy URL
--extra-headers, -H - Extra headers (repeatable)
Examples
# Basic browser fetch
scrapling extract fetch https://spa-website.com page.html
# Wait for specific element
scrapling extract fetch https://example.com content.md \
--wait-selector "div.loaded-content" \
--network-idle
# Visible browser with delay
scrapling extract fetch https://example.com page.html \
--no-headless \
--wait 3000
# Speed boost by blocking resources
scrapling extract fetch https://example.com content.txt \
--disable-resources
# Custom locale
scrapling extract fetch https://example.com page.html \
--locale "de-DE"
Stealthy Fetcher
Use advanced stealth features for protected sites:
scrapling extract stealthy-fetch https://protected-site.com output.html
Additional Options
--block-webrtc / --allow-webrtc - Block WebRTC (default: false)
--solve-cloudflare / --no-solve-cloudflare - Solve Cloudflare challenges (default: false)
--allow-webgl / --block-webgl - Allow WebGL (default: true)
--hide-canvas / --show-canvas - Add canvas noise (default: false)
--real-chrome / --no-real-chrome - Use real Chrome (default: false)
Examples
# Bypass Cloudflare protection
scrapling extract stealthy-fetch https://protected.com page.html \
--solve-cloudflare
# Maximum stealth configuration
scrapling extract stealthy-fetch https://protected.com content.md \
--solve-cloudflare \
--block-webrtc \
--hide-canvas \
--network-idle
# Extract specific content after Cloudflare
scrapling extract stealthy-fetch https://protected.com data.txt \
--solve-cloudflare \
--css-selector "div.main-content" \
--wait-selector "div.loaded"
The output format is determined by file extension:
scrapling extract get https://example.com page.html
Saves raw HTML content. scrapling extract get https://example.com page.md
Converts HTML to Markdown format. scrapling extract get https://example.com page.txt
Extracts plain text content.
All extract commands support the --css-selector option to extract specific elements:
# Extract all matching elements
scrapling extract get https://news.com articles.md \
--css-selector "article.post"
# Extract specific section
scrapling extract fetch https://example.com main.html \
--css-selector "main#content"
When using --css-selector, all matching elements are returned, not just the first one.
Common Patterns
API requests with authentication
scrapling extract get https://api.example.com/data response.json \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Accept: application/json"
Scraping with session cookies
scrapling extract get https://example.com/protected page.html \
--cookies "session_id=abc123; user_token=xyz789"
Waiting for dynamic content
scrapling extract fetch https://spa-site.com content.md \
--wait-selector "div.content-loaded" \
--network-idle
Handling Cloudflare protection
scrapling extract stealthy-fetch https://protected.com page.html \
--solve-cloudflare \
--wait 2000
CLI Overview View all CLI commands
Fetchers Learn about different fetchers