Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/D4Vinci/Scrapling/llms.txt

Use this file to discover all available pages before exploring further.

Scrapling’s extract commands allow you to fetch web pages and save content in various formats without writing code.

HTTP Methods

GET Request

Perform a GET request and save content:
scrapling extract get https://example.com output.html

Options

  • --headers, -H - HTTP headers (repeatable)
  • --cookies - Cookie string
  • --timeout - Timeout in seconds (default: 30)
  • --proxy - Proxy URL
  • --css-selector, -s - CSS selector for specific content
  • --params, -p - Query parameters (repeatable)
  • --follow-redirects / --no-follow-redirects - Follow redirects (default: true)
  • --verify / --no-verify - Verify SSL certificates (default: true)
  • --impersonate - Browser to impersonate
  • --stealthy-headers / --no-stealthy-headers - Use browser-like headers (default: true)

Examples

# Save as HTML
scrapling extract get https://example.com page.html

# Save as Markdown
scrapling extract get https://example.com page.md

# Save as plain text
scrapling extract get https://example.com page.txt

# With custom headers
scrapling extract get https://api.example.com/data output.json \
  -H "Authorization: Bearer token123" \
  -H "Accept: application/json"

# Extract specific element
scrapling extract get https://example.com content.md \
  --css-selector "article.main-content"

# With proxy
scrapling extract get https://example.com page.html \
  --proxy "http://user:pass@proxy.example.com:8080"

# Impersonate browsers (single or random)
scrapling extract get https://example.com page.html --impersonate chrome
scrapling extract get https://example.com page.html --impersonate "chrome,firefox,safari"

POST Request

Perform a POST request with form data or JSON:
scrapling extract post https://example.com/api output.json --json '{"key": "value"}'

Additional Options

  • --data, -d - Form data (“param1=value1&param2=value2”)
  • --json, -j - JSON data as string

Examples

# POST with JSON
scrapling extract post https://api.example.com/endpoint response.json \
  --json '{"name": "John", "age": 30}'

# POST with form data
scrapling extract post https://example.com/form result.html \
  --data "username=user&password=pass"

# POST with headers and cookies
scrapling extract post https://api.example.com/data output.json \
  --json '{"query": "search term"}' \
  -H "Content-Type: application/json" \
  --cookies "session=abc123; token=xyz789"

PUT Request

Update resources with PUT:
scrapling extract put https://api.example.com/resource/1 response.json \
  --json '{"status": "updated"}'

DELETE Request

Delete resources:
scrapling extract delete https://api.example.com/resource/1 response.html

Browser Automation

Dynamic Fetcher

Use Playwright for JavaScript-heavy sites:
scrapling extract fetch https://example.com output.html

Options

  • --headless / --no-headless - Headless mode (default: true)
  • --disable-resources / --enable-resources - Block unnecessary resources (default: false)
  • --network-idle / --no-network-idle - Wait for network idle (default: false)
  • --timeout - Timeout in milliseconds (default: 30000)
  • --wait - Additional wait time in milliseconds (default: 0)
  • --css-selector, -s - Extract specific content
  • --wait-selector - CSS selector to wait for
  • --locale - Browser locale
  • --real-chrome / --no-real-chrome - Use real Chrome installation (default: false)
  • --proxy - Proxy URL
  • --extra-headers, -H - Extra headers (repeatable)

Examples

# Basic browser fetch
scrapling extract fetch https://spa-website.com page.html

# Wait for specific element
scrapling extract fetch https://example.com content.md \
  --wait-selector "div.loaded-content" \
  --network-idle

# Visible browser with delay
scrapling extract fetch https://example.com page.html \
  --no-headless \
  --wait 3000

# Speed boost by blocking resources
scrapling extract fetch https://example.com content.txt \
  --disable-resources

# Custom locale
scrapling extract fetch https://example.com page.html \
  --locale "de-DE"

Stealthy Fetcher

Use advanced stealth features for protected sites:
scrapling extract stealthy-fetch https://protected-site.com output.html

Additional Options

  • --block-webrtc / --allow-webrtc - Block WebRTC (default: false)
  • --solve-cloudflare / --no-solve-cloudflare - Solve Cloudflare challenges (default: false)
  • --allow-webgl / --block-webgl - Allow WebGL (default: true)
  • --hide-canvas / --show-canvas - Add canvas noise (default: false)
  • --real-chrome / --no-real-chrome - Use real Chrome (default: false)

Examples

# Bypass Cloudflare protection
scrapling extract stealthy-fetch https://protected.com page.html \
  --solve-cloudflare

# Maximum stealth configuration
scrapling extract stealthy-fetch https://protected.com content.md \
  --solve-cloudflare \
  --block-webrtc \
  --hide-canvas \
  --network-idle

# Extract specific content after Cloudflare
scrapling extract stealthy-fetch https://protected.com data.txt \
  --solve-cloudflare \
  --css-selector "div.main-content" \
  --wait-selector "div.loaded"

Output Formats

The output format is determined by file extension:
scrapling extract get https://example.com page.html
Saves raw HTML content.

CSS Selector Extraction

All extract commands support the --css-selector option to extract specific elements:
# Extract all matching elements
scrapling extract get https://news.com articles.md \
  --css-selector "article.post"

# Extract specific section
scrapling extract fetch https://example.com main.html \
  --css-selector "main#content"
When using --css-selector, all matching elements are returned, not just the first one.

Common Patterns

scrapling extract get https://api.example.com/data response.json \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Accept: application/json"
scrapling extract get https://example.com/protected page.html \
  --cookies "session_id=abc123; user_token=xyz789"
scrapling extract fetch https://spa-site.com content.md \
  --wait-selector "div.content-loaded" \
  --network-idle
scrapling extract stealthy-fetch https://protected.com page.html \
  --solve-cloudflare \
  --wait 2000

CLI Overview

View all CLI commands

Fetchers

Learn about different fetchers

Build docs developers (and LLMs) love