Use this file to discover all available pages before exploring further.
Language models have a training cutoff. When your agent needs current prices, recent news, or the contents of a specific URL, it has to reach out to the live web. Tavily provides three complementary APIs—search, extract, and crawl—purpose-built for agents. This tutorial shows you how to configure each tool with the LangChain integration, build a ReAct research agent, and extend it into a hybrid agent that blends public web data with your own internal documents.
Search
Semantically ranked results with title, URL, and content snippets—up to 10 per call.
Extract
Full page content from up to 20 URLs at once, including advanced mode for dynamic content.
Crawl
Explore a website’s link graph and gather content from linked pages in a single call.
Before building an agent, run the three endpoints manually to understand what each one returns.
Search
Search with filters
Extract
from tavily import TavilyClienttavily_client = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))# Basic search — 5 resultsresults = tavily_client.search( query="What happened in NYC today?", max_results=5,)for r in results["results"]: print(r["title"]) print(r["url"]) print(r["content"]) print(r["score"]) print()
Each result has a semantic relevance score. Use it to decide which URLs are worth extracting in full.
# Filter by time range, domain, and topicresults = tavily_client.search( query="Anthropic model release?", max_results=5, time_range="month", include_domains=["techcrunch.com"], topic="news",)for r in results["results"]: print(r["title"]) print(r["url"]) print(r["content"]) print()
topic="news" focuses results on trusted third-party news sources. All results will be from techcrunch.com and dated within the last month.
# Extract full page content from the URLs returned by searchextract_results = tavily_client.extract( urls=[r["url"] for r in results["results"]], # extract_depth="advanced", # uncomment for dynamic pages, tables, and embedded media)for r in extract_results["results"]: print(r["url"]) print(r["raw_content"]) print()
The extract endpoint accepts up to 20 URLs per call. raw_content contains the full text—much more detail than the search snippet.
raw_content from the extract endpoint can be large. Keep your model’s context window in mind when passing extracted content directly to an LLM.
The langchain_tavily package exposes the three endpoints as LangChain tools with configurable defaults. The agent overrides these defaults at runtime based on the query context.
from langchain_tavily import TavilySearch, TavilyExtract, TavilyCrawl# Search — up to 10 results, general topicsearch = TavilySearch(max_results=10, topic="general")# Extract — advanced depth for complex pagesextract = TavilyExtract(extract_depth="advanced")# Crawl — explore a site's link graphcrawl = TavilyCrawl()
Set up your language models:
from langchain_openai import ChatOpenAIo3_mini = ChatOpenAI(model="o3-mini-2025-01-31", api_key=os.getenv("OPENAI_API_KEY"))gpt_4_1 = ChatOpenAI(model="gpt-4.1", api_key=os.getenv("OPENAI_API_KEY"))
The agent is a LangGraph ReAct graph. The system prompt explains when to use each tool and how to cite sources.
import datetimefrom langgraph.prebuilt import create_react_agentfrom langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholdertoday = datetime.datetime.today().strftime("%A, %B %d, %Y")web_agent = create_react_agent( model=gpt_4_1, tools=[search, extract, crawl], prompt=ChatPromptTemplate.from_messages( [ ( "system", f"""You are a research agent equipped with advanced web tools: Tavily Web Search,Web Crawl, and Web Extract. Your mission is to conduct comprehensive, accurate,and up-to-date research, grounding your findings in credible web sources.**Today's Date:** {today}**Available Tools:**1. **Tavily Web Search** - Retrieve relevant web pages based on a query. - Use parameters such as `search_depth`, `time_range`, `include_domains`, and `include_raw_content`. - Break complex queries into focused sub-queries.2. **Tavily Web Crawl** - Explore a website's link graph and gather content from linked pages. - Specify `max_depth`, `max_breadth`, and `extract_depth`. - Use `select_paths` or `exclude_paths` to focus the crawl.3. **Tavily Web Extract** - Extract full content from specific URLs. - Set `extract_depth` to "advanced" for tables and embedded media.**Research methodology:**- Thought → Action → Observation, repeated as needed.- Always cite source URLs inline.- Never fabricate information.- Present the final answer in markdown with citations.""", ), MessagesPlaceholder(variable_name="messages"), ] ), name="web_agent",)
from langchain_core.messages import HumanMessageinputs = { "messages": [ HumanMessage( content="find all the iphone models currently available on apple.com and their prices" ) ]}for s in web_agent.stream(inputs, stream_mode="values"): message = s["messages"][-1] if isinstance(message, tuple): print(message) else: message.pretty_print()
Watch the intermediate steps in the streamed output to see how the agent decides between search, extract, and crawl for each query.
The agent adapts its tool strategy to the query type. Here are the three main patterns:
Search only
Search then extract
Search then crawl
Use when: you need a quick overview from multiple sources.Example: “What are recent AI news headlines?”The agent calls TavilySearch with time_range="week" and synthesizes the snippets into a summary with source links.
Use when: a search result looks highly relevant but the snippet is too short.Example: “Provide detailed insights into quantum computing advancements.”
TavilySearch finds 10 relevant articles.
TavilyExtract retrieves the full text of the top result.
The agent synthesizes the detailed content with citations.
Use when: you need deep coverage of a single authoritative source.Example: “What are the latest renewable energy technologies?”
TavilySearch identifies a leading industry site.
TavilyCrawl with max_depth=2 explores that site’s linked pages.
The agent synthesizes findings from across the crawled pages.
For enterprise use cases, combine Tavily’s live web access with a private vector store. This lets the agent compare public information against your internal CRM data, meeting notes, or documentation.
Pass all four tools—search, crawl, extract, and vector search—to the same ReAct agent.
hybrid_agent = create_react_agent( model=gpt_4_1, tools=[search, crawl, extract, vector_search_tool], prompt=ChatPromptTemplate.from_messages( [ ( "system", f"""You are a ReAct-style research agent with access to:- Tavily Web Search, Tavily Web Extract, Tavily Web Crawl (public web)- Internal Vector Search (proprietary CRM data: Meta, Apple, Google, Amazon, Microsoft, Tesla accounts)**Today's Date:** {today}All answers must be grounded in retrieved information. You may not use priorknowledge or fabricate data. If tools return nothing useful, say so.When a question involves a company, check both the public web and the CRMvector store. Cite source URLs for web content; note the internal source forCRM data.Workflow: Thought → Action → Observation. Repeat as needed. Respond only aftergathering all required information.""", ), MessagesPlaceholder(variable_name="messages"), ] ), name="hybrid_agent",)
inputs = { "messages": [ HumanMessage( content=( "Search for the latest news on Google relevant to our " "current CRM data on them" ) ) ]}for s in hybrid_agent.stream(inputs, stream_mode="values"): message = s["messages"][-1] if isinstance(message, tuple): print(message) else: message.pretty_print()
The agent runs TavilySearch to find recent Google news, then vector_search to retrieve your CRM notes on Google, and synthesizes both into a single report.
The vector database included in the tutorial uses synthetic CRM data to demonstrate the pattern. Replace persist_directory and collection_name with your own Chroma database, or swap Chroma for any LangChain-compatible vector store.