Documentation Index
Fetch the complete documentation index at: https://mintlify.com/goetzcj/web-to-markdown/llms.txt
Use this file to discover all available pages before exploring further.
Function Signature
Parameters
Full URL including scheme (e.g.,
https://docs.example.com/api)Skip static fetch and go straight to headless browser. Use this for known JS-heavy targets like:
- Single-page applications (SPAs)
- Swagger UI instances
- Interactive documentation sites
- Pages that require client-side rendering
Return Value
Clean markdown string extracted from the page, or an error message prefixed with
"ERROR:" if the fetch fails.Images are automatically stripped from the output to reduce noise for agent consumption.Behavior
Two-Stage Fetch Strategy
-
Static fetch (~1 second)
- Performs fast HTTP request with browser-like headers
- Applies readability algorithm to strip navigation, ads, sidebars, and footers
- Converts HTML to markdown
- Returns immediately if content contains ≥200 characters after whitespace collapse
-
Playwright fallback (~5-8 seconds)
- Triggered automatically if static fetch returns thin content (<200 chars)
- Launches headless Chromium browser
- Waits for network idle and 3-second JS execution delay
- Applies same readability → markdown pipeline
- If content is still thin after readability extraction, tries raw HTML conversion as final fallback
Error Handling
Errors are returned as strings (not raised as exceptions) to simplify agent integration:- Playwright not installed: Returns
"ERROR: Page appears JavaScript-rendered but Playwright is not installed. Run: pip install playwright && playwright install chromium" - Fetch failure: Returns
"ERROR: Could not fetch {url}. The page may require authentication or block automated access." - Thin content: Returns
"ERROR: Fetched {url} but content appears to be behind a login wall or requires user interaction that cannot be automated."
Examples
Basic Usage
JavaScript-Heavy Site
Error Handling
Common Use Cases
Performance Considerations
- Static fetch: ~1 second for most pages
- Playwright fetch: ~5-8 seconds (includes browser launch, JS execution, rendering)
- Playwright overhead: ~200MB disk space for Chromium binary (one-time install)
playwright_first=True when you know the site requires JavaScript. This skips the initial static fetch attempt and saves 1-2 seconds.
Related Functions
fetch_api_spec- Specialized function for fetching API documentation that returns raw JSON/YAML specs when available
Source Reference
Implemented inscripts/fetch_as_markdown.py:119-165