Documentation Index
Fetch the complete documentation index at: https://mintlify.com/goetzcj/web-to-markdown/llms.txt
Use this file to discover all available pages before exploring further.
The Agno integration provides a native Toolkit class that registers two tools for your agents: one for fetching general webpages and one for API documentation.
Installation
Install dependencies
pip install requests readability-lxml html2text playwright
Install Chromium (one-time)
Required only for JavaScript-heavy pages. This is a ~200MB download.playwright install chromium
If you skip this step, the toolkit will work fine for static pages. When it encounters a JS-rendered page without Playwright installed, the error message tells you exactly what to run. Import the toolkit
from scripts.agno_toolkit import WebToMarkdownTools
Basic Usage
from agno import Agent
from scripts.agno_toolkit import WebToMarkdownTools
agent = Agent(tools=[WebToMarkdownTools()])
This registers two tools with your agent:
fetch_page_as_markdown - Fetch any webpage as clean markdown
fetch_api_spec_tool - Fetch API docs or OpenAPI/Swagger specs
fetch_page_as_markdown
Fetches a webpage and returns its content as clean markdown. Automatically handles JavaScript-rendered pages using a two-stage strategy:
- Static fetch (~1s) - Fast HTTP request for regular pages
- Headless browser fallback (~5-8s) - Automatically used if static fetch returns insufficient content
Parameters:
url (str) - Full URL of the page to fetch (must include https://)
Returns:
- Clean markdown of the page content, or an error message prefixed with
"ERROR:"
Fetches API documentation or an OpenAPI/Swagger spec. Smart about content types:
- If the server returns JSON/YAML (
Content-Type: application/json or similar), returns the raw spec directly
- Otherwise, returns clean markdown of the docs page
Parameters:
url (str) - URL of the API docs page or raw spec file
Returns:
- Raw spec (JSON/YAML) or clean markdown of the docs page
Configuration Options
playwright_first Mode
For known JavaScript-heavy targets (SPAs, Swagger UI, React documentation sites), you can skip the static fetch entirely and go straight to the headless browser:
from agno import Agent
from scripts.agno_toolkit import WebToMarkdownTools
# Always use headless browser for reliable rendering
agent = Agent(tools=[WebToMarkdownTools(playwright_first=True)])
When to use playwright_first=True:
- Single-page applications (SPAs)
- Swagger UI instances
- React/Vue/Angular documentation sites
- Any site you know requires JavaScript to render content
Trade-off:
- Slower (~5-8s vs ~1s for static pages)
- More reliable for JS-heavy content
- Avoids the cost of trying static fetch first when you know it will fail
Complete Example
from agno import Agent
from scripts.agno_toolkit import WebToMarkdownTools
# Create agent with web-to-markdown tools
agent = Agent(
name="Documentation Assistant",
tools=[WebToMarkdownTools()],
instructions=[
"You help users understand technical documentation.",
"When given a URL, fetch it as markdown and summarize the key points.",
],
)
# Agent automatically uses the tools when needed
response = agent.run(
"Read https://docs.example.com/api and explain how authentication works"
)
print(response)
Example with playwright_first
from agno import Agent
from scripts.agno_toolkit import WebToMarkdownTools
# Agent optimized for JavaScript-heavy documentation sites
agent = Agent(
name="API Explorer",
tools=[WebToMarkdownTools(playwright_first=True)],
instructions=[
"You help users explore API documentation.",
"Fetch Swagger UI pages and explain available endpoints.",
],
)
# Will use headless browser immediately for reliable rendering
response = agent.run(
"Fetch https://app.example.com/swagger and list all POST endpoints"
)
print(response)
Error Handling
Errors are returned as strings prefixed with "ERROR:" rather than raised exceptions. This means your agent can handle them inline without try/catch blocks:
# No try/catch needed — errors come back as descriptive strings
result = agent.run("Fetch https://invalid-url")
# result will contain: "ERROR: Invalid URL format" (or similar)
Common error scenarios:
- Invalid URL format
- Network timeouts
- Login walls or bot detection
- Pages that remain empty even after JavaScript execution
Source Code
The complete Agno toolkit implementation:
"""
agno_toolkit.py
===============
Agno-specific wrapper for the web-to-markdown skill.
Usage:
from scripts.agno_toolkit import WebToMarkdownTools
agent = Agent(tools=[WebToMarkdownTools()])
# For known JS-heavy targets (SPAs, Swagger UI):
agent = Agent(tools=[WebToMarkdownTools(playwright_first=True)])
"""
from agno.tools import Toolkit
from agno.utils.log import logger
from scripts.fetch_as_markdown import fetch_as_markdown, fetch_api_spec
class WebToMarkdownTools(Toolkit):
"""
Agno Toolkit: fetch any webpage and return clean markdown.
Handles JS-rendered pages transparently via headless browser fallback.
"""
def __init__(self, playwright_first: bool = False):
"""
Args:
playwright_first: Always use headless browser instead of trying
a static fetch first. Slower (~5-8s vs ~1s) but
reliable for SPAs and Swagger UI instances.
"""
super().__init__(name="web_to_markdown")
self.playwright_first = playwright_first
self.register(self.fetch_page_as_markdown)
self.register(self.fetch_api_spec_tool)
def fetch_page_as_markdown(self, url: str) -> str:
"""
Fetch a webpage and return its content as clean markdown.
Automatically handles JavaScript-rendered pages — if a fast static
fetch returns insufficient content, a headless browser is used as
a fallback. The agent never needs to manage this distinction.
Args:
url: Full URL of the page to fetch (must include https://)
Returns:
Clean markdown of the page content, or an error message.
"""
logger.info(f"[web-to-markdown] fetch_page_as_markdown: {url}")
return fetch_as_markdown(url, playwright_first=self.playwright_first)
def fetch_api_spec_tool(self, url: str) -> str:
"""
Fetch API documentation or an OpenAPI/Swagger spec.
Returns raw JSON/YAML if the server provides it directly (useful for
OpenAPI specs that agents can parse natively). Otherwise returns clean
markdown of the docs page.
Args:
url: URL of the API docs page or raw spec file
Returns:
Raw spec (JSON/YAML) or clean markdown of the docs page.
"""
logger.info(f"[web-to-markdown] fetch_api_spec: {url}")
return fetch_api_spec(url)