Skip to main content
Scrapling’s MCP server provides six powerful tools for web scraping operations. Each tool is optimized for different use cases and protection levels.

Available Tools

get

Make stealth HTTP GET requests to fetch web pages. Best for: Low to mid protection levels, simple HTTP requests Parameters:
url
string
required
The URL to request
impersonate
string
default:"chrome"
Browser to impersonate (chrome, firefox, safari, etc.)
extraction_type
string
default:"markdown"
Output format: markdown, html, or text
css_selector
string
CSS selector to extract specific content
main_content_only
boolean
default:"true"
Extract only content within <body> tag
headers
object
Custom HTTP headers
cookies
object
Cookies to include in request
proxy
string
Proxy URL (format: “http://user:pass@host:port”)
timeout
number
default:"30"
Request timeout in seconds
stealthy_headers
boolean
default:"true"
Use real browser headers
Example usage:
{
  "url": "https://example.com",
  "extraction_type": "markdown",
  "css_selector": "article.main",
  "impersonate": "chrome"
}

bulk_get

Fetch multiple URLs concurrently with HTTP GET requests. Best for: Scraping multiple pages efficiently Parameters: Same as get, but accepts urls (array) instead of url (string).
urls
array[string]
required
List of URLs to fetch concurrently
Example usage:
{
  "urls": [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3"
  ],
  "extraction_type": "markdown",
  "impersonate": "firefox"
}

fetch

Use Playwright browser automation for JavaScript-heavy sites. Best for: Single-page applications, sites requiring JavaScript execution Parameters:
url
string
required
The URL to fetch
extraction_type
string
default:"markdown"
Output format: markdown, html, or text
headless
boolean
default:"true"
Run browser in headless mode
disable_resources
boolean
default:"false"
Block images, fonts, media for speed boost
network_idle
boolean
default:"false"
Wait for no network activity for 500ms
timeout
number
default:"30000"
Timeout in milliseconds
wait
number
default:"0"
Additional wait time in milliseconds
wait_selector
string
CSS selector to wait for before proceeding
wait_selector_state
string
default:"attached"
State to wait for: attached, detached, visible, hidden
real_chrome
boolean
default:"false"
Use real Chrome installation instead of Chromium
Set referer as Google search of domain
Example usage:
{
  "url": "https://spa-website.com",
  "extraction_type": "markdown",
  "wait_selector": "div.content-loaded",
  "network_idle": true,
  "disable_resources": true
}

bulk_fetch

Fetch multiple URLs with browser automation concurrently. Best for: Scraping multiple JavaScript-heavy pages Parameters: Same as fetch, but accepts urls (array) instead of url (string). Example usage:
{
  "urls": [
    "https://app1.example.com",
    "https://app2.example.com"
  ],
  "headless": true,
  "network_idle": true
}

stealthy_fetch

Advanced stealth browser automation with Cloudflare bypass. Best for: High protection sites, Cloudflare-protected pages Parameters: All fetch parameters, plus:
solve_cloudflare
boolean
default:"false"
Automatically solve Cloudflare challenges
block_webrtc
boolean
default:"false"
Block WebRTC to prevent IP leaks
allow_webgl
boolean
default:"true"
Allow WebGL (recommended for stealth)
hide_canvas
boolean
default:"false"
Add noise to canvas fingerprinting
additional_args
object
Additional Playwright context settings
Example usage:
{
  "url": "https://protected-site.com",
  "extraction_type": "markdown",
  "solve_cloudflare": true,
  "block_webrtc": true,
  "hide_canvas": true,
  "wait": 2000
}

bulk_stealthy_fetch

Fetch multiple protected URLs with advanced stealth. Best for: Scraping multiple Cloudflare-protected sites Parameters: Same as stealthy_fetch, but accepts urls (array) instead of url (string). Example usage:
{
  "urls": [
    "https://protected1.com",
    "https://protected2.com"
  ],
  "solve_cloudflare": true,
  "network_idle": true
}

Response Format

All tools return a structured response:
{
  "status": 200,
  "content": ["Extracted content in requested format"],
  "url": "https://example.com"
}
For bulk operations, an array of responses is returned:
[
  {
    "status": 200,
    "content": ["Content from URL 1"],
    "url": "https://example.com/page1"
  },
  {
    "status": 200,
    "content": ["Content from URL 2"],
    "url": "https://example.com/page2"
  }
]

Extraction Types

Converts HTML to clean Markdown format:
{"extraction_type": "markdown"}
Best for: Readable text, content processing, AI consumption

CSS Selectors

All tools support CSS selectors for targeted extraction:
# Extract all articles
{"css_selector": "article.post"}

# Extract main content
{"css_selector": "main#content"}

# Extract specific elements
{"css_selector": "div.product-info"}
When css_selector matches multiple elements, all matches are returned in the content array.

Authentication

HTTP Basic Auth

{
  "url": "https://example.com",
  "auth": {
    "username": "user",
    "password": "pass"
  }
}

Proxy Authentication

{
  "url": "https://example.com",
  "proxy": "http://proxy.example.com:8080",
  "proxy_auth": {
    "username": "proxy_user",
    "password": "proxy_pass"
  }
}

Common Patterns

{
  "url": "https://example.com",
  "extraction_type": "markdown"
}
{
  "url": "https://news.example.com/article",
  "css_selector": "article.content",
  "extraction_type": "markdown",
  "main_content_only": true
}
{
  "url": "https://spa.example.com",
  "wait_selector": "div.loaded",
  "network_idle": true,
  "extraction_type": "html"
}
{
  "url": "https://protected.example.com",
  "solve_cloudflare": true,
  "wait": 2000,
  "extraction_type": "markdown"
}
{
  "urls": [
    "https://site1.com",
    "https://site2.com",
    "https://site3.com"
  ],
  "impersonate": "chrome,firefox,safari",
  "stealthy_headers": true,
  "extraction_type": "markdown"
}

Tool Selection Guide

1

Simple HTTP sites

Use get or bulk_get for basic HTML pages without JavaScript
2

JavaScript-heavy sites

Use fetch or bulk_fetch for SPAs and dynamic content
3

Protected sites

Use stealthy_fetch or bulk_stealthy_fetch for Cloudflare and WAF-protected sites
4

Multiple URLs

Use bulk variants (bulk_get, bulk_fetch, bulk_stealthy_fetch) for concurrent operations

MCP Server

Learn about the MCP server

Setup Guide

Configure MCP server for AI clients

Build docs developers (and LLMs) love