TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/IconDean/research-agent/llms.txt
Use this file to discover all available pages before exploring further.
tools module provides the four core functions that power Deep Research Agent’s information-gathering layer. search_web and fetch_page are invoked directly by Gemini during the tool-use loop; score_source computes credibility scores used to filter low-quality sources; and run_tool acts as a unified dispatcher that the loop uses to execute any of the three by name. All functions integrate with ResearchContext to cache results and deduplicate work across iterations.
Module Constants
These constants govern fetch and search behavior across the module and can be used as reference values when setting up custom pipelines.| Constant | Value | Description |
|---|---|---|
MAX_SEARCH_RESULTS | 5 | Maximum results returned per search_web call |
MAX_PAGE_CHARS | 3000 | Character limit applied by fetch_page after cleaning |
REQUEST_TIMEOUT | 15 | HTTP request timeout in seconds |
search_web()
ddgs library as the primary backend with duckduckgo_search as a fallback. Results are automatically scored and stored in context if one is provided.
The search query string to send to DuckDuckGo.
An active
ResearchContext instance. When provided, the function calls
context.add_query(query) to log the query and
context.add_search_results(results, question) to score and store each
returned URL in source_metadata.Optional progress callback. Fires
("search", query) immediately before
the DuckDuckGo request is made.list[dict[str, str]] — Up to 5 result dicts, each containing:
| Key | Type | Description |
|---|---|---|
title | str | Page title from the search result |
url | str | Full URL of the result |
snippet | str | DuckDuckGo-provided text excerpt |
[] if the search fails for any reason (exception is caught and logged).
Usage Example
fetch_page()
MAX_PAGE_CHARS (3 000 characters). Integrates with ResearchContext for caching and credibility filtering.
Processing pipeline:
- Checks
context.is_fetched(url)— returns cached content immediately if already fetched. - Computes the credibility score for the URL. If the score is at or below
MIN_CREDIBILITY_SCORE(0.5), the fetch is blocked. - Issues a
GETrequest with theResearchAgent/1.0User-Agent and a 15-second timeout. - Validates that the
Content-Typeistext/htmlorapplication/xhtml. - Strips
<script>,<style>,<nav>,<footer>,<header>, and<aside>tags via BeautifulSoup. - Extracts plain text, normalizes whitespace, and truncates at a word boundary.
The fully-qualified URL of the page to fetch.
An active
ResearchContext instance. Used for cache lookup
(is_fetched), credibility filtering (get_score), and storing the
result (add_fetched_page).Optional progress callback. Fires
("fetch", url) when a fetch begins
and ("block", url) when a URL is rejected by the credibility filter.str — Cleaned page text up to 3 000 characters, or an empty string on failure or if the URL is blocked.
Calling
fetch_page for a URL that has already been fetched in the same
session returns the cached content instantly without making a network
request. This prevents redundant fetches across multiple tool-use
iterations.Usage Example
score_source()
0.0 and 1.0 for a given source URL and snippet. The score combines domain authority, keyword relevance, and recency signals.
The URL of the source to score. The domain is extracted and matched
against known domain tiers.
The DuckDuckGo snippet or a short excerpt from the page. Used for both
relevance and recency scoring.
The original research question. Content words from the question (excluding
common stopwords) are compared against the snippet to compute relevance.
float — A score in [0.0, 1.0] computed as:
| Component | Weight | Calculation |
|---|---|---|
domain_score | 0.4 | .edu/.gov → 0.9 · .org → 0.7 · major domains → 0.8 · social media → 0.3 · other → 0.4 |
relevance_score | 0.5 | Fraction of non-stopword question words present in the snippet |
recency_score | 0.1 | +0.1 if snippet contains a 202x year or a relative time expression |
0.5 are blocked by fetch_page.
run_tool()
The tool name to execute. Supported values:
"search_web",
"fetch_page", "score_source".Input arguments for the tool. Required keys depend on the tool:
"search_web"→{"query": str}"fetch_page"→{"url": str}"score_source"→{"url": str, "snippet": str}
Passed through to the underlying tool function for caching and
context tracking.
Passed through to the underlying tool function for progress reporting.
str — A JSON-serialized string. Shape depends on the tool:
| Tool | Success shape | Error shape |
|---|---|---|
"search_web" | {"results": [{title, url, snippet}, ...]} | {"error": "..."} |
"fetch_page" | {"url": str, "content": str} | {"error": "..."} |
"score_source" | {"url": str, "credibility_score": float} | {"error": "..."} |
| Unknown name | — | {"error": "Unknown tool: <name>"} |