Documentation Index
Fetch the complete documentation index at: https://mintlify.com/goetzcj/web-to-markdown/llms.txt
Use this file to discover all available pages before exploring further.
Quick Start
Learn how to fetch webpages and convert them to clean markdown with three common use cases.Basic Usage
Import the core functions from the script:Example 1: Fetch a Static Page
For traditional server-rendered pages, the static fetch (~1s) will handle it automatically:Example 2: Fetch a JavaScript-Heavy Page
For known SPAs, React documentation, or Swagger UI instances, skip straight to the browser:Setting
playwright_first=True skips the static HTTP request entirely and goes directly to the headless Chromium browser. Use this when you know the target is JavaScript-rendered to save a failed static request.Example 3: Fetch an API Specification
For OpenAPI/Swagger specs, usefetch_api_spec which checks the Content-Type header first:
application/json or application/yaml in the Content-Type header, you’ll get the raw spec directly. This is useful because many agents can parse OpenAPI specs natively without needing a markdown representation.
If the URL points to an HTML documentation page instead of a raw spec file, it falls back to fetch_as_markdown automatically.
Error Handling
Errors are returned as strings prefixed with"ERROR:" rather than raised as exceptions:
Common Error Messages
Using the CLI
You can also use the script from the command line without writing any Python code:How the Two-Stage Fetch Works
Under the hood,fetch_as_markdown() implements this flow:
Static Fetch (Fast Path)
Sends a standard HTTP request with browser-like headers (~1 second)
- Runs the HTML through readability to strip navigation, ads, sidebars
- Converts to markdown with html2text
- If the result has ≥200 characters of real text, returns it immediately
Content Validation
Checks if the markdown is “thin” (less than 200 characters after whitespace normalization)This threshold catches JavaScript-gated shells that return empty
<div id="app"></div> elements without falsely flagging legitimately short pages.Playwright Fallback (Slow Path)
If static fetch returned thin content, automatically launches headless Chromium (~5-8 seconds)
- Waits for
networkidleevent plus 3 seconds for JavaScript frameworks to finish rendering - Runs the fully-rendered HTML through the same readability → html2text pipeline
- Returns the result if it has enough content
fetch_as_markdown() and it handles everything automatically.
Next Steps
Framework Integration
Learn how to integrate with Agno, LangChain, CrewAI, and other agent frameworks
API Reference
Detailed documentation of all functions and parameters