Skip to main content
Scrapling’s extract commands allow you to fetch web pages and save content in various formats without writing code.

HTTP Methods

GET Request

Perform a GET request and save content:
scrapling extract get https://example.com output.html

Options

  • --headers, -H - HTTP headers (repeatable)
  • --cookies - Cookie string
  • --timeout - Timeout in seconds (default: 30)
  • --proxy - Proxy URL
  • --css-selector, -s - CSS selector for specific content
  • --params, -p - Query parameters (repeatable)
  • --follow-redirects / --no-follow-redirects - Follow redirects (default: true)
  • --verify / --no-verify - Verify SSL certificates (default: true)
  • --impersonate - Browser to impersonate
  • --stealthy-headers / --no-stealthy-headers - Use browser-like headers (default: true)

Examples

# Save as HTML
scrapling extract get https://example.com page.html

# Save as Markdown
scrapling extract get https://example.com page.md

# Save as plain text
scrapling extract get https://example.com page.txt

# With custom headers
scrapling extract get https://api.example.com/data output.json \
  -H "Authorization: Bearer token123" \
  -H "Accept: application/json"

# Extract specific element
scrapling extract get https://example.com content.md \
  --css-selector "article.main-content"

# With proxy
scrapling extract get https://example.com page.html \
  --proxy "http://user:[email protected]:8080"

# Impersonate browsers (single or random)
scrapling extract get https://example.com page.html --impersonate chrome
scrapling extract get https://example.com page.html --impersonate "chrome,firefox,safari"

POST Request

Perform a POST request with form data or JSON:
scrapling extract post https://example.com/api output.json --json '{"key": "value"}'

Additional Options

  • --data, -d - Form data (“param1=value1&param2=value2”)
  • --json, -j - JSON data as string

Examples

# POST with JSON
scrapling extract post https://api.example.com/endpoint response.json \
  --json '{"name": "John", "age": 30}'

# POST with form data
scrapling extract post https://example.com/form result.html \
  --data "username=user&password=pass"

# POST with headers and cookies
scrapling extract post https://api.example.com/data output.json \
  --json '{"query": "search term"}' \
  -H "Content-Type: application/json" \
  --cookies "session=abc123; token=xyz789"

PUT Request

Update resources with PUT:
scrapling extract put https://api.example.com/resource/1 response.json \
  --json '{"status": "updated"}'

DELETE Request

Delete resources:
scrapling extract delete https://api.example.com/resource/1 response.html

Browser Automation

Dynamic Fetcher

Use Playwright for JavaScript-heavy sites:
scrapling extract fetch https://example.com output.html

Options

  • --headless / --no-headless - Headless mode (default: true)
  • --disable-resources / --enable-resources - Block unnecessary resources (default: false)
  • --network-idle / --no-network-idle - Wait for network idle (default: false)
  • --timeout - Timeout in milliseconds (default: 30000)
  • --wait - Additional wait time in milliseconds (default: 0)
  • --css-selector, -s - Extract specific content
  • --wait-selector - CSS selector to wait for
  • --locale - Browser locale
  • --real-chrome / --no-real-chrome - Use real Chrome installation (default: false)
  • --proxy - Proxy URL
  • --extra-headers, -H - Extra headers (repeatable)

Examples

# Basic browser fetch
scrapling extract fetch https://spa-website.com page.html

# Wait for specific element
scrapling extract fetch https://example.com content.md \
  --wait-selector "div.loaded-content" \
  --network-idle

# Visible browser with delay
scrapling extract fetch https://example.com page.html \
  --no-headless \
  --wait 3000

# Speed boost by blocking resources
scrapling extract fetch https://example.com content.txt \
  --disable-resources

# Custom locale
scrapling extract fetch https://example.com page.html \
  --locale "de-DE"

Stealthy Fetcher

Use advanced stealth features for protected sites:
scrapling extract stealthy-fetch https://protected-site.com output.html

Additional Options

  • --block-webrtc / --allow-webrtc - Block WebRTC (default: false)
  • --solve-cloudflare / --no-solve-cloudflare - Solve Cloudflare challenges (default: false)
  • --allow-webgl / --block-webgl - Allow WebGL (default: true)
  • --hide-canvas / --show-canvas - Add canvas noise (default: false)
  • --real-chrome / --no-real-chrome - Use real Chrome (default: false)

Examples

# Bypass Cloudflare protection
scrapling extract stealthy-fetch https://protected.com page.html \
  --solve-cloudflare

# Maximum stealth configuration
scrapling extract stealthy-fetch https://protected.com content.md \
  --solve-cloudflare \
  --block-webrtc \
  --hide-canvas \
  --network-idle

# Extract specific content after Cloudflare
scrapling extract stealthy-fetch https://protected.com data.txt \
  --solve-cloudflare \
  --css-selector "div.main-content" \
  --wait-selector "div.loaded"

Output Formats

The output format is determined by file extension:
scrapling extract get https://example.com page.html
Saves raw HTML content.

CSS Selector Extraction

All extract commands support the --css-selector option to extract specific elements:
# Extract all matching elements
scrapling extract get https://news.com articles.md \
  --css-selector "article.post"

# Extract specific section
scrapling extract fetch https://example.com main.html \
  --css-selector "main#content"
When using --css-selector, all matching elements are returned, not just the first one.

Common Patterns

scrapling extract get https://api.example.com/data response.json \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Accept: application/json"
scrapling extract get https://example.com/protected page.html \
  --cookies "session_id=abc123; user_token=xyz789"
scrapling extract fetch https://spa-site.com content.md \
  --wait-selector "div.content-loaded" \
  --network-idle
scrapling extract stealthy-fetch https://protected.com page.html \
  --solve-cloudflare \
  --wait 2000

CLI Overview

View all CLI commands

Fetchers

Learn about different fetchers

Build docs developers (and LLMs) love