Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/NirDiamant/agents-towards-production/llms.txt

Use this file to discover all available pages before exploring further.

Language models have a training cutoff. When your agent needs current prices, recent news, or the contents of a specific URL, it has to reach out to the live web. Tavily provides three complementary APIs—search, extract, and crawl—purpose-built for agents. This tutorial shows you how to configure each tool with the LangChain integration, build a ReAct research agent, and extend it into a hybrid agent that blends public web data with your own internal documents.

Search

Semantically ranked results with title, URL, and content snippets—up to 10 per call.

Extract

Full page content from up to 20 URLs at once, including advanced mode for dynamic content.

Crawl

Explore a website’s link graph and gather content from linked pages in a single call.

Prerequisites

pip install -U tavily-python langchain-openai langchain langchain-tavily langgraph
Set your API keys:
import os
import getpass
from dotenv import load_dotenv

load_dotenv()

if not os.environ.get("TAVILY_API_KEY"):
    os.environ["TAVILY_API_KEY"] = getpass.getpass("TAVILY_API_KEY:\n")

if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY:\n")

Explore the Tavily API directly

Before building an agent, run the three endpoints manually to understand what each one returns.
raw_content from the extract endpoint can be large. Keep your model’s context window in mind when passing extracted content directly to an LLM.

Define the LangChain tool wrappers

The langchain_tavily package exposes the three endpoints as LangChain tools with configurable defaults. The agent overrides these defaults at runtime based on the query context.
from langchain_tavily import TavilySearch, TavilyExtract, TavilyCrawl

# Search — up to 10 results, general topic
search = TavilySearch(max_results=10, topic="general")

# Extract — advanced depth for complex pages
extract = TavilyExtract(extract_depth="advanced")

# Crawl — explore a site's link graph
crawl = TavilyCrawl()
Set up your language models:
from langchain_openai import ChatOpenAI

o3_mini = ChatOpenAI(model="o3-mini-2025-01-31", api_key=os.getenv("OPENAI_API_KEY"))
gpt_4_1 = ChatOpenAI(model="gpt-4.1", api_key=os.getenv("OPENAI_API_KEY"))

Build the web research agent

The agent is a LangGraph ReAct graph. The system prompt explains when to use each tool and how to cite sources.
import datetime
from langgraph.prebuilt import create_react_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

today = datetime.datetime.today().strftime("%A, %B %d, %Y")

web_agent = create_react_agent(
    model=gpt_4_1,
    tools=[search, extract, crawl],
    prompt=ChatPromptTemplate.from_messages(
        [
            (
                "system",
                f"""
You are a research agent equipped with advanced web tools: Tavily Web Search,
Web Crawl, and Web Extract. Your mission is to conduct comprehensive, accurate,
and up-to-date research, grounding your findings in credible web sources.

**Today's Date:** {today}

**Available Tools:**

1. **Tavily Web Search**
   - Retrieve relevant web pages based on a query.
   - Use parameters such as `search_depth`, `time_range`, `include_domains`,
     and `include_raw_content`.
   - Break complex queries into focused sub-queries.

2. **Tavily Web Crawl**
   - Explore a website's link graph and gather content from linked pages.
   - Specify `max_depth`, `max_breadth`, and `extract_depth`.
   - Use `select_paths` or `exclude_paths` to focus the crawl.

3. **Tavily Web Extract**
   - Extract full content from specific URLs.
   - Set `extract_depth` to "advanced" for tables and embedded media.

**Research methodology:**
- Thought → Action → Observation, repeated as needed.
- Always cite source URLs inline.
- Never fabricate information.
- Present the final answer in markdown with citations.
""",
            ),
            MessagesPlaceholder(variable_name="messages"),
        ]
    ),
    name="web_agent",
)

Run example queries

from langchain_core.messages import HumanMessage

inputs = {
    "messages": [
        HumanMessage(
            content="find all the iphone models currently available on apple.com and their prices"
        )
    ]
}

for s in web_agent.stream(inputs, stream_mode="values"):
    message = s["messages"][-1]
    if isinstance(message, tuple):
        print(message)
    else:
        message.pretty_print()
Watch the intermediate steps in the streamed output to see how the agent decides between search, extract, and crawl for each query.

Tool selection patterns

The agent adapts its tool strategy to the query type. Here are the three main patterns:
Use when: you need a quick overview from multiple sources.Example: “What are recent AI news headlines?”The agent calls TavilySearch with time_range="week" and synthesizes the snippets into a summary with source links.

Build a hybrid agent: web + private knowledge

For enterprise use cases, combine Tavily’s live web access with a private vector store. This lets the agent compare public information against your internal CRM data, meeting notes, or documentation.

Set up the vector store

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

vector_store = Chroma(
    collection_name="crm",
    embedding_function=embeddings,
    persist_directory="supplemental/db",
)

retriever = vector_store.as_retriever()
Test the retriever independently:
results = retriever.invoke("robotics use case")
for doc in results:
    print(doc.page_content)
    print()

Expose the retriever as a tool

vector_search_tool = retriever.as_tool(
    name="vector_search",
    description="Perform a vector search on our company's CRM data.",
)

Build the hybrid agent

Pass all four tools—search, crawl, extract, and vector search—to the same ReAct agent.
hybrid_agent = create_react_agent(
    model=gpt_4_1,
    tools=[search, crawl, extract, vector_search_tool],
    prompt=ChatPromptTemplate.from_messages(
        [
            (
                "system",
                f"""
You are a ReAct-style research agent with access to:
- Tavily Web Search, Tavily Web Extract, Tavily Web Crawl (public web)
- Internal Vector Search (proprietary CRM data: Meta, Apple, Google, Amazon,
  Microsoft, Tesla accounts)

**Today's Date:** {today}

All answers must be grounded in retrieved information. You may not use prior
knowledge or fabricate data. If tools return nothing useful, say so.

When a question involves a company, check both the public web and the CRM
vector store. Cite source URLs for web content; note the internal source for
CRM data.

Workflow: Thought → Action → Observation. Repeat as needed. Respond only after
gathering all required information.
""",
            ),
            MessagesPlaceholder(variable_name="messages"),
        ]
    ),
    name="hybrid_agent",
)

Run a hybrid query

inputs = {
    "messages": [
        HumanMessage(
            content=(
                "Search for the latest news on Google relevant to our "
                "current CRM data on them"
            )
        )
    ]
}

for s in hybrid_agent.stream(inputs, stream_mode="values"):
    message = s["messages"][-1]
    if isinstance(message, tuple):
        print(message)
    else:
        message.pretty_print()
The agent runs TavilySearch to find recent Google news, then vector_search to retrieve your CRM notes on Google, and synthesizes both into a single report.
The vector database included in the tutorial uses synthetic CRM data to demonstrate the pattern. Replace persist_directory and collection_name with your own Chroma database, or swap Chroma for any LangChain-compatible vector store.

Monitoring and research workflows

Competitive intelligence

Combine time_range="week" on web search with CRM data to track competitor activity against your current accounts.

Trend monitoring

Schedule the agent on a cron job. Pass a fixed query each run and diff the results to surface emerging trends.

Deep-dive research

Use search to discover relevant sites, then crawl each one at max_depth=2 for comprehensive coverage of a topic.

Document enrichment

Run TavilyExtract on URLs found in your CRM records to pull the latest public information about a company.

Build docs developers (and LLMs) love