Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/goetzcj/web-to-markdown/llms.txt

Use this file to discover all available pages before exploring further.

The core fetch logic in scripts/fetch_as_markdown.py has zero framework dependencies. This means you can use it standalone or wrap it in any agent framework with minimal code.

Design Philosophy

The framework-agnostic approach follows a simple principle:

Core Principle

Keep the hard parts framework-free, make the wrapper trivial.All the complexity (two-stage fetch, readability stripping, JS rendering, error handling) lives in pure Python functions. Framework adapters are just thin wrappers that translate between your framework’s tool format and the core functions.
This design provides:
  • No vendor lock-in - Switch frameworks without rewriting fetch logic
  • Easy testing - Test core logic without framework overhead
  • Simple maintenance - Framework updates don’t break your fetch code
  • Minimal code - Most adapters are 5-10 lines

Adapter Pattern

Every framework adapter follows the same pattern:
# 1. Import the core functions
from scripts.fetch_as_markdown import fetch_as_markdown, fetch_api_spec

# 2. Wrap in your framework's tool decorator/class
@your_framework.tool
def fetch_page_as_markdown(url: str) -> str:
    """Docstring becomes tool description in most frameworks."""
    return fetch_as_markdown(url)

# 3. Register with your agent
agent = YourFramework(tools=[fetch_page_as_markdown])
The docstring is important! Most frameworks use it as the tool description that helps the agent decide when to invoke the tool.

Framework Examples

Installation

pip install langchain

Implementation

from langchain.tools import tool
from scripts.fetch_as_markdown import fetch_as_markdown, fetch_api_spec

@tool
def fetch_page_as_markdown(url: str) -> str:
    """Fetch a webpage and return clean markdown. Handles JS-rendered pages automatically."""
    return fetch_as_markdown(url)

@tool
def fetch_api_spec_tool(url: str) -> str:
    """Fetch API docs or OpenAPI spec. Returns raw JSON/YAML if available, markdown otherwise."""
    return fetch_api_spec(url)

Usage

from langchain.agents import initialize_agent, AgentType
from langchain.llms import OpenAI

llm = OpenAI(temperature=0)
tools = [fetch_page_as_markdown, fetch_api_spec_tool]

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

result = agent.run("Fetch the Python asyncio documentation and summarize it")

Creating Your Own Adapter

If your framework isn’t listed above, follow these steps:

1. Identify Your Framework’s Tool Format

Find how your framework defines tools. Common patterns:
@framework.tool
def my_tool(param: str) -> str:
    """Description"""
    return result

2. Wrap the Core Functions

from scripts.fetch_as_markdown import fetch_as_markdown, fetch_api_spec
from your_framework import tool_decorator  # or base class, etc.

@tool_decorator  # Use your framework's decorator
def fetch_page_as_markdown(url: str) -> str:
    """
    Fetch a webpage and return clean markdown.
    Handles JavaScript-rendered pages automatically.
    """
    return fetch_as_markdown(url)

@tool_decorator
def fetch_api_spec_tool(url: str) -> str:
    """
    Fetch API documentation or OpenAPI spec.
    Returns raw JSON/YAML if available, markdown otherwise.
    """
    return fetch_api_spec(url)

3. Handle Framework-Specific Requirements

Some frameworks need additional configuration:
If your framework requires specific type hints:
from your_framework import ToolInput, ToolOutput

@tool
def fetch_page_as_markdown(url: ToolInput[str]) -> ToolOutput[str]:
    return fetch_as_markdown(url)

4. Test Your Adapter

# test_adapter.py
from your_adapter import fetch_page_as_markdown, fetch_api_spec_tool

def test_fetch_page():
    result = fetch_page_as_markdown("https://example.com")
    assert not result.startswith("ERROR:")
    assert len(result) > 100

def test_fetch_api_spec():
    result = fetch_api_spec_tool("https://api.github.com")
    assert not result.startswith("ERROR:")

Advanced Patterns

Configuration Options

Pass configuration through your adapter:
from scripts.fetch_as_markdown import fetch_as_markdown

class ConfigurableFetchTool(BaseTool):
    def __init__(self, playwright_first: bool = False):
        self.playwright_first = playwright_first
        super().__init__()
    
    def _run(self, url: str) -> str:
        return fetch_as_markdown(url, playwright_first=self.playwright_first)

# Usage
tool_for_spas = ConfigurableFetchTool(playwright_first=True)
tool_for_static = ConfigurableFetchTool(playwright_first=False)

Caching Results

Add caching to reduce redundant fetches:
from functools import lru_cache
from scripts.fetch_as_markdown import fetch_as_markdown

@lru_cache(maxsize=100)
def cached_fetch(url: str, playwright_first: bool = False) -> str:
    return fetch_as_markdown(url, playwright_first=playwright_first)

@tool
def fetch_page_as_markdown(url: str) -> str:
    return cached_fetch(url)

Rate Limiting

Protect against excessive requests:
import time
from threading import Lock
from scripts.fetch_as_markdown import fetch_as_markdown

class RateLimitedFetchTool:
    def __init__(self, min_interval: float = 1.0):
        self.min_interval = min_interval
        self.last_fetch = 0
        self.lock = Lock()
    
    def fetch(self, url: str) -> str:
        with self.lock:
            elapsed = time.time() - self.last_fetch
            if elapsed < self.min_interval:
                time.sleep(self.min_interval - elapsed)
            
            result = fetch_as_markdown(url)
            self.last_fetch = time.time()
            return result

Logging and Telemetry

Add observability to your adapter:
import logging
from scripts.fetch_as_markdown import fetch_as_markdown

logger = logging.getLogger(__name__)

@tool
def fetch_page_as_markdown(url: str) -> str:
    logger.info(f"Fetching {url}")
    
    start = time.time()
    result = fetch_as_markdown(url)
    duration = time.time() - start
    
    if result.startswith("ERROR:"):
        logger.error(f"Failed to fetch {url}: {result}")
    else:
        logger.info(f"Fetched {url} in {duration:.2f}s ({len(result)} chars)")
    
    return result

Why Framework-Agnostic?

The agent framework landscape is fragmented and fast-moving. By keeping the core logic framework-free:
1

Future-proof

When new frameworks emerge, adaptation is trivial. No need to rewrite fetch logic.
2

Multi-framework projects

Use the same fetch code across different agents in different frameworks.
3

Easy testing

Test core logic without framework dependencies. No mocking framework internals.
4

Portable knowledge

Understanding one adapter helps you create adapters for any framework.

Contributing Adapters

If you create an adapter for a new framework, consider contributing it:
  1. Add your adapter example to references/framework-adapters.md
  2. Follow the existing pattern (5-10 lines of wrapper code)
  3. Include a usage example
  4. Test it with the actual framework
  5. Submit a pull request
See CONTRIBUTING.md for details.

See Also

Build docs developers (and LLMs) love