tools.py: Web Search and Page Fetch Functions

The tools module provides the four core functions that power Deep Research Agent’s information-gathering layer. search_web and fetch_page are invoked directly by Gemini during the tool-use loop; score_source computes credibility scores used to filter low-quality sources; and run_tool acts as a unified dispatcher that the loop uses to execute any of the three by name. All functions integrate with ResearchContext to cache results and deduplicate work across iterations.

from tools import search_web, fetch_page, score_source, run_tool

Module Constants

These constants govern fetch and search behavior across the module and can be used as reference values when setting up custom pipelines.

Constant	Value	Description
`MAX_SEARCH_RESULTS`	`5`	Maximum results returned per `search_web` call
`MAX_PAGE_CHARS`	`3000`	Character limit applied by `fetch_page` after cleaning
`REQUEST_TIMEOUT`	`15`	HTTP request timeout in seconds

`search_web()`

def search_web(
    query: str,
    context: Any = None,
    on_progress: Callable[[str, str], None] | None = None,
) -> list[dict[str, str]]

Searches the web via DuckDuckGo and returns up to five results. Uses the ddgs library as the primary backend with duckduckgo_search as a fallback. Results are automatically scored and stored in context if one is provided.

query

str

required

The search query string to send to DuckDuckGo.

context

ResearchContext

An active ResearchContext instance. When provided, the function calls context.add_query(query) to log the query and context.add_search_results(results, question) to score and store each returned URL in source_metadata.

on_progress

Callable[[str, str], None]

Optional progress callback. Fires ("search", query) immediately before the DuckDuckGo request is made.

Returns: list[dict[str, str]] — Up to 5 result dicts, each containing:

Key	Type	Description
`title`	`str`	Page title from the search result
`url`	`str`	Full URL of the result
`snippet`	`str`	DuckDuckGo-provided text excerpt

Returns an empty list [] if the search fails for any reason (exception is caught and logged).

Usage Example

from tools import search_web
from context import ResearchContext

ctx = ResearchContext(question="Effects of microplastics on marine life")

results = search_web(
    query="microplastics ocean fish toxicity 2024",
    context=ctx,
    on_progress=lambda e, d: print(f"[{e}] {d}"),
)

for r in results:
    print(r["title"], r["url"])

`fetch_page()`

def fetch_page(
    url: str,
    context: Any = None,
    on_progress: Callable[[str, str], None] | None = None,
) -> str

Fetches a web page, strips navigational and decorative HTML, and returns clean readable text truncated to MAX_PAGE_CHARS (3 000 characters). Integrates with ResearchContext for caching and credibility filtering. Processing pipeline:

Checks context.is_fetched(url) — returns cached content immediately if already fetched.
Computes the credibility score for the URL. If the score is at or below MIN_CREDIBILITY_SCORE (0.5), the fetch is blocked.
Issues a GET request with the ResearchAgent/1.0 User-Agent and a 15-second timeout.
Validates that the Content-Type is text/html or application/xhtml.
Strips <script>, <style>, <nav>, <footer>, <header>, and <aside> tags via BeautifulSoup.
Extracts plain text, normalizes whitespace, and truncates at a word boundary.

url

str

required

The fully-qualified URL of the page to fetch.

context

ResearchContext

An active ResearchContext instance. Used for cache lookup (is_fetched), credibility filtering (get_score), and storing the result (add_fetched_page).

on_progress

Callable[[str, str], None]

Optional progress callback. Fires ("fetch", url) when a fetch begins and ("block", url) when a URL is rejected by the credibility filter.

Returns: str — Cleaned page text up to 3 000 characters, or an empty string on failure or if the URL is blocked.

Calling fetch_page for a URL that has already been fetched in the same session returns the cached content instantly without making a network request. This prevents redundant fetches across multiple tool-use iterations.

Usage Example

from tools import fetch_page
from context import ResearchContext

ctx = ResearchContext(question="Deep-sea lithium mining")

content = fetch_page(
    url="https://example.edu/deep-sea-mining-report",
    context=ctx,
    on_progress=lambda e, d: print(f"[{e}] {d}"),
)

print(content[:500])

`score_source()`

def score_source(url: str, snippet: str, question: str) -> float

Computes a credibility score between 0.0 and 1.0 for a given source URL and snippet. The score combines domain authority, keyword relevance, and recency signals.

url

str

required

The URL of the source to score. The domain is extracted and matched against known domain tiers.

snippet

str

required

The DuckDuckGo snippet or a short excerpt from the page. Used for both relevance and recency scoring.

question

str

required

The original research question. Content words from the question (excluding common stopwords) are compared against the snippet to compute relevance.

Returns: float — A score in [0.0, 1.0] computed as:

score = (domain_score × 0.4) + (relevance_score × 0.5) + recency_score

Score components:

Component	Weight	Calculation
`domain_score`	0.4	`.edu`/`.gov` → 0.9 · `.org` → 0.7 · major domains → 0.8 · social media → 0.3 · other → 0.4
`relevance_score`	0.5	Fraction of non-stopword question words present in the snippet
`recency_score`	0.1	+0.1 if snippet contains a `202x` year or a relative time expression

Sources with a final score ≤ 0.5 are blocked by fetch_page.

`run_tool()`

def run_tool(
    name: str,
    tool_input: dict[str, Any],
    context: Any = None,
    on_progress: Callable[[str, str], None] | None = None,
) -> str

Dispatches a tool call by name and returns a JSON-serialized string result. This is the function Gemini’s tool-use loop calls directly — it maps tool names to their implementations and handles serialization and error formatting uniformly.

name

str

required

The tool name to execute. Supported values: "search_web", "fetch_page", "score_source".

tool_input

dict[str, Any]

required

Input arguments for the tool. Required keys depend on the tool:

"search_web" → {"query": str}
"fetch_page" → {"url": str}
"score_source" → {"url": str, "snippet": str}

context

ResearchContext

Passed through to the underlying tool function for caching and context tracking.

on_progress

Callable[[str, str], None]

Passed through to the underlying tool function for progress reporting.

Returns: str — A JSON-serialized string. Shape depends on the tool:

Tool	Success shape	Error shape
`"search_web"`	`{"results": [{title, url, snippet}, ...]}`	`{"error": "..."}`
`"fetch_page"`	`{"url": str, "content": str}`	`{"error": "..."}`
`"score_source"`	`{"url": str, "credibility_score": float}`	`{"error": "..."}`
Unknown name	—	`{"error": "Unknown tool: <name>"}`

Use run_tool whenever you need to invoke tools programmatically without knowing the specific function at call time — for example, when replaying a recorded sequence of Gemini tool calls or building a custom tool-use harness. It handles JSON serialization and error formatting for all three tools uniformly.

import json
from tools import run_tool
from context import ResearchContext

ctx = ResearchContext(question="Fusion energy progress 2024")

# Search
result_json = run_tool("search_web", {"query": "fusion energy 2024"}, context=ctx)
data = json.loads(result_json)
for r in data["results"]:
    print(r["title"], r["url"])

# Fetch
page_json = run_tool("fetch_page", {"url": data["results"][0]["url"]}, context=ctx)
page = json.loads(page_json)
print(page["content"][:300])

# Score
score_json = run_tool(
    "score_source",
    {"url": data["results"][0]["url"], "snippet": data["results"][0]["snippet"]},
)
print(json.loads(score_json)["credibility_score"])

REST API

Python Modules

tools.py: Web Search and Page Fetch Functions

Module Constants

`search_web()`

Usage Example

`fetch_page()`

Usage Example

`score_source()`

`run_tool()`

Build docs developers (and LLMs) love

REST API

Python Modules

Documentation Index

​Module Constants

​search_web()

​Usage Example

​fetch_page()

​Usage Example

​score_source()

​run_tool()

Build docs developers (and LLMs) love

Module Constants

`search_web()`

Usage Example

`fetch_page()`

Usage Example

`score_source()`

`run_tool()`