Quick Start

Learn how to fetch webpages and convert them to clean markdown with three common use cases.

Basic Usage

Import the core functions from the script:

from scripts.fetch_as_markdown import fetch_as_markdown, fetch_api_spec

Example 1: Fetch a Static Page

For traditional server-rendered pages, the static fetch (~1s) will handle it automatically:

from scripts.fetch_as_markdown import fetch_as_markdown

# Fetch any page — static first, headless browser fallback if needed
markdown = fetch_as_markdown("https://docs.example.com/api")
print(markdown)

The two-stage strategy means you don’t have to know whether a page is static or JavaScript-rendered — it will try the fast path first and automatically fall back if needed.

Example 2: Fetch a JavaScript-Heavy Page

For known SPAs, React documentation, or Swagger UI instances, skip straight to the browser:

from scripts.fetch_as_markdown import fetch_as_markdown

# Known JS-heavy target (SPA, Swagger UI, React docs) — skip straight to browser
markdown = fetch_as_markdown(
    "https://app.example.com/swagger",
    playwright_first=True
)
print(markdown)

Setting playwright_first=True skips the static HTTP request entirely and goes directly to the headless Chromium browser. Use this when you know the target is JavaScript-rendered to save a failed static request.

Example 3: Fetch an API Specification

For OpenAPI/Swagger specs, use fetch_api_spec which checks the Content-Type header first:

from scripts.fetch_as_markdown import fetch_api_spec

# API docs — returns raw JSON/YAML if the server provides it, markdown otherwise
spec = fetch_api_spec("https://api.example.com/openapi.json")
print(spec)

If the server returns application/json or application/yaml in the Content-Type header, you’ll get the raw spec directly. This is useful because many agents can parse OpenAPI specs natively without needing a markdown representation. If the URL points to an HTML documentation page instead of a raw spec file, it falls back to fetch_as_markdown automatically.

Error Handling

Errors are returned as strings prefixed with "ERROR:" rather than raised as exceptions:

from scripts.fetch_as_markdown import fetch_as_markdown

result = fetch_as_markdown("https://login-required.example.com")

if result.startswith("ERROR:"):
    print(f"Failed to fetch: {result}")
else:
    print(f"Success! Got {len(result)} characters of markdown")

This design means agents can handle errors inline without try/catch blocks.

Common Error Messages

ERROR: Page appears JavaScript-rendered but Playwright is not installed.
Run: pip install playwright && playwright install chromium

Using the CLI

You can also use the script from the command line without writing any Python code:

python scripts/fetch_as_markdown.py https://docs.example.com/getting-started

How the Two-Stage Fetch Works

Under the hood, fetch_as_markdown() implements this flow:

Static Fetch (Fast Path)

Sends a standard HTTP request with browser-like headers (~1 second)

Runs the HTML through readability to strip navigation, ads, sidebars
Converts to markdown with html2text
If the result has ≥200 characters of real text, returns it immediately

Content Validation

Checks if the markdown is “thin” (less than 200 characters after whitespace normalization)This threshold catches JavaScript-gated shells that return empty <div id="app"></div> elements without falsely flagging legitimately short pages.

Playwright Fallback (Slow Path)

If static fetch returned thin content, automatically launches headless Chromium (~5-8 seconds)

Waits for networkidle event plus 3 seconds for JavaScript frameworks to finish rendering
Runs the fully-rendered HTML through the same readability → html2text pipeline
Returns the result if it has enough content

Error Detection

If even Playwright returns thin content, returns an error string explaining the page is likely behind a login wall or blocking automated access

You never have to think about this flow — just call fetch_as_markdown() and it handles everything automatically.

Get Started

Core Concepts

Usage

Framework Integration

Quick Start

Quick Start

Basic Usage

Example 1: Fetch a Static Page

Example 2: Fetch a JavaScript-Heavy Page

Example 3: Fetch an API Specification

Error Handling

Common Error Messages

Using the CLI

How the Two-Stage Fetch Works

Next Steps

Framework Integration

API Reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Usage

Framework Integration

Documentation Index

​Quick Start

​Basic Usage

​Example 1: Fetch a Static Page

​Example 2: Fetch a JavaScript-Heavy Page

​Example 3: Fetch an API Specification

​Error Handling

​Common Error Messages

​Using the CLI

​How the Two-Stage Fetch Works

​Next Steps

Framework Integration

API Reference

Build docs developers (and LLMs) love

Quick Start

Basic Usage

Example 1: Fetch a Static Page

Example 2: Fetch a JavaScript-Heavy Page

Example 3: Fetch an API Specification

Error Handling

Common Error Messages

Using the CLI

How the Two-Stage Fetch Works

Next Steps