Documentation Index
Fetch the complete documentation index at: https://mintlify.com/goetzcj/web-to-markdown/llms.txt
Use this file to discover all available pages before exploring further.
Web-to-Markdown
An agent skill that fetches any webpage and returns clean, content-focused markdown — handling JavaScript-rendered pages automatically so your agent never has to.The Problem
Raw HTML is a terrible format for agents. It’s bloated with nav menus, cookie banners, sidebars, and scripts that have nothing to do with the actual content. And on modern sites, a plain HTTP request often returns an empty JavaScript shell — the agent sees nothing useful and either hallucinates or gives up.The Solution
This skill solves both problems with a two-stage fetch strategy:- Fast static request first (~1s) — works for most traditional sites
- Automatic headless browser fallback (~5-8s) — handles JavaScript-rendered content when needed
- Intelligent content extraction — strips boilerplate using the same algorithm as Firefox Reader Mode
- Clean markdown conversion — returns only the content that matters
Key Features
Two-Stage Fetch Strategy
Fast static HTTP first, automatic Playwright fallback for JS-heavy pages. Your agent never has to decide which method to use.
Battle-Tested Content Extraction
Uses the Firefox Reader Mode algorithm (readability-lxml) to strip navigation, ads, sidebars, and other boilerplate from millions of real-world pages.
Massive Token Reduction
60-80% fewer tokens than raw HTML by removing scripts, styles, navigation menus, cookie banners, and other noise.
Framework Agnostic
Core script has zero framework dependencies. Wrap in 5-10 lines for Agno, LangChain, CrewAI, OpenAI Agents SDK, or any other framework.
Graceful Error Handling
Errors returned as strings prefixed with “ERROR:” rather than raised exceptions — agents can handle them inline without try/catch.
API-Spec Aware
Automatically detects and returns raw JSON/YAML for OpenAPI specs when the server provides them, falling back to markdown conversion only when needed.
How It Works
The two-stage fetch strategy intelligently handles both static and JavaScript-rendered pages:Next Steps
Installation
Install Python dependencies and set up Playwright for JavaScript-rendered pages
Quick Start
Get up and running in 2 minutes with real working examples