Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/goetzcj/web-to-markdown/llms.txt

Use this file to discover all available pages before exploring further.

The fetch_as_markdown.py script can be run directly from the command line for quick fetches, debugging, or integration into shell scripts and automation workflows.

Basic Syntax

python scripts/fetch_as_markdown.py <url> [options]

Required Arguments

url
string
required
The full URL to fetch, including the scheme (https:// or http://). Without a scheme, the request will fail.

Optional Flags

—playwright-first

Skip the static HTTP fetch and go straight to the headless browser.
python scripts/fetch_as_markdown.py https://app.example.com/swagger --playwright-first
Use this flag for known JavaScript-heavy targets:
  • Single Page Applications (React, Vue, Angular)
  • Swagger UI / OpenAPI documentation viewers
  • Sites that render entirely client-side
  • Interactive documentation platforms
When to use:
  • You know the site requires JavaScript to render
  • Static fetch consistently returns thin/empty content
  • You want to save ~1-2 seconds by skipping the static attempt

—api-spec

Treat the target as API documentation. Returns raw JSON or YAML if the server provides it.
python scripts/fetch_as_markdown.py https://api.example.com/openapi.json --api-spec
Behavior:
  1. Sends Accept: application/json,application/yaml,text/yaml,text/html header
  2. Checks response Content-Type
  3. If application/json, yaml, or text/plain → returns raw response body
  4. Otherwise → falls back to markdown conversion via fetch_as_markdown()

—output / -o

Write output to a file instead of stdout.
python scripts/fetch_as_markdown.py https://docs.example.com --output docs.md
Short form:
python scripts/fetch_as_markdown.py https://docs.example.com -o docs.md
Output:
  • File is written with UTF-8 encoding
  • Prints confirmation message: Written to docs.md
  • Creates parent directories if needed (standard Python open() behavior)

Usage Examples

Fetch a standard documentation page:
python scripts/fetch_as_markdown.py https://docs.python.org/3/library/asyncio.html
Output goes to stdout (terminal). Fast static fetch will be attempted first.

Combining Flags

python scripts/fetch_as_markdown.py \
  https://app.example.com/docs \
  --playwright-first \
  --output docs.md

Shell Integration

Pipe to Other Tools

# Count words in documentation
python scripts/fetch_as_markdown.py https://docs.example.com | wc -w

# Search for specific content
python scripts/fetch_as_markdown.py https://docs.example.com | grep -i "authentication"

# Convert to HTML with pandoc
python scripts/fetch_as_markdown.py https://docs.example.com | pandoc -f markdown -t html -o output.html

Batch Processing Script

#!/bin/bash
# fetch-docs.sh - Fetch multiple documentation pages

URLS=(
  "https://docs.example.com/intro"
  "https://docs.example.com/api-reference"
  "https://docs.example.com/guides"
)

for url in "${URLS[@]}"; do
  filename=$(basename "$url").md
  echo "Fetching $url..."
  python scripts/fetch_as_markdown.py "$url" -o "$filename"
done

echo "Done! Fetched ${#URLS[@]} pages."

Error Handling in Scripts

#!/bin/bash
# fetch-with-retry.sh - Fetch with error handling

URL="$1"
OUTPUT="$2"

if [ -z "$URL" ] || [ -z "$OUTPUT" ]; then
  echo "Usage: $0 <url> <output-file>"
  exit 1
fi

# Capture output
RESULT=$(python scripts/fetch_as_markdown.py "$URL" 2>&1)

# Check for errors
if echo "$RESULT" | grep -q "^ERROR:"; then
  echo "Failed to fetch $URL:"
  echo "$RESULT"
  exit 1
fi

# Save successful result
echo "$RESULT" > "$OUTPUT"
echo "✓ Saved to $OUTPUT"

Makefile Integration

# Makefile example for documentation pipeline

.PHONY: fetch-docs clean

DOCS_DIR := docs/fetched
URLS := https://docs.example.com/api https://docs.example.com/guide

$(DOCS_DIR)/%.md:
	@mkdir -p $(DOCS_DIR)
	python scripts/fetch_as_markdown.py $* -o $@

fetch-docs:
	@for url in $(URLS); do \
		filename=$$(basename $$url).md; \
		python scripts/fetch_as_markdown.py $$url -o $(DOCS_DIR)/$$filename; \
	done

clean:
	rm -rf $(DOCS_DIR)

Output Format

Successful Fetch

With --output flag:
Written to docs.md
Without --output flag:
# Page Title

Clean markdown content with boilerplate stripped...

## Section 1

Content here...

Error Output

Errors are written to stdout (not stderr) with ERROR: prefix:
python scripts/fetch_as_markdown.py https://invalid-url.example.com
ERROR: Could not fetch https://invalid-url.example.com. The page may require authentication or block automated access.
Error messages go to stdout, not stderr, so agents and scripts can parse them inline without separate error stream handling.

Help Command

View all available options:
python scripts/fetch_as_markdown.py --help
Output:
usage: fetch_as_markdown.py [-h] [--playwright-first] [--api-spec] [--output FILE] url

Fetch a URL and return clean markdown.

positional arguments:
  url                   URL to fetch (include https://)

options:
  -h, --help            show this help message and exit
  --playwright-first    Skip static fetch; use headless browser immediately
  --api-spec            Treat as API docs — return raw JSON/YAML if available
  --output FILE, -o FILE
                        Write output to file instead of stdout

Performance Comparison

time python scripts/fetch_as_markdown.py https://docs.example.com/intro
Typical duration: 0.8-1.5 secondsUsed for:
  • Server-rendered HTML pages
  • Static documentation sites
  • Traditional web pages

Common Use Cases

1. Quick Documentation Check

python scripts/fetch_as_markdown.py https://docs.example.com/api | less

2. Save for Agent Context

python scripts/fetch_as_markdown.py https://docs.stripe.com/api -o context/stripe-api.md

3. Compare Documentation Versions

python scripts/fetch_as_markdown.py https://docs.example.com/v2/api -o v2.md
python scripts/fetch_as_markdown.py https://docs.example.com/v3/api -o v3.md
diff v2.md v3.md

4. Build Documentation Mirror

#!/bin/bash
BASE_URL="https://docs.example.com"
PAGES=("intro" "quickstart" "api-reference" "guides/authentication")

for page in "${PAGES[@]}"; do
  mkdir -p "mirror/$(dirname "$page")"
  python scripts/fetch_as_markdown.py "$BASE_URL/$page" \
    -o "mirror/$page.md"
done

Environment Variables

The script doesn’t currently use environment variables, but you can create wrapper scripts:
#!/bin/bash
# fetch-wrapper.sh - Add default options via wrapper

DEFAULT_FLAGS="--playwright-first"
python scripts/fetch_as_markdown.py "$@" $DEFAULT_FLAGS

Exit Codes

The script currently doesn’t use exit codes for different error types. All errors return a message with the ERROR: prefix to stdout. For scripting, check output content:
OUTPUT=$(python scripts/fetch_as_markdown.py "$URL")
if [[ "$OUTPUT" == ERROR:* ]]; then
  echo "Fetch failed"
  exit 1
fi

See Also

Build docs developers (and LLMs) love