The website scraper gives agents clean, readable text from any public webpage. It usesDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/vrashmanyu605-eng/Agentic_Sales-Markerting/llms.txt
Use this file to discover all available pages before exploring further.
requests to fetch the page with a browser-like User-Agent header, then passes the HTML through BeautifulSoup to strip out <script>, <style>, and <noscript> tags before extracting visible text. The result is capped at 10,000 characters so it fits comfortably inside an LLM context window.
Source code
website_scraper.py
scrape_website()
Fetches a URL and returns cleaned plain text extracted from the page body.
Parameters
The full URL of the page to scrape, including the scheme (e.g.
"https://www.acmecorp.com").Return value
Returns astr containing up to 10,000 characters of whitespace-normalized plain text from the page. Words are joined with a single space separator and leading/trailing whitespace is stripped.
On any exception — network error, timeout, parse failure — the function returns an error string in the format "Scraping Error: <message>" rather than raising an exception.
Behavior details
| Aspect | Detail |
|---|---|
| HTTP client | requests.get() |
| User-Agent | Mozilla/5.0 (Windows NT 10.0; Win64; x64) |
| Timeout | 10 seconds |
| HTML parser | lxml via BeautifulSoup |
| Tags removed | <script>, <style>, <noscript> |
| Text extraction | soup.get_text(separator=" ", strip=True) |
| Output limit | text[:10000] — first 10,000 characters |
| Error handling | Returns error string, never raises |