Use this file to discover all available pages before exploring further.
Standard HTTP requests fail against modern anti-bot systems. Rate limits, CAPTCHAs, and geo-restrictions block naive scrapers before they collect meaningful data. Bright Data’s infrastructure handles all of that—its global proxy network and built-in bypass mechanisms give your agent reliable access to any public web source. This tutorial shows you two integration paths: the langchain-brightdata package for quick setup, and the Bright Data MCP server for access to 60+ specialized platform extractors.
Global proxy network
Route requests through Bright Data’s residential and datacenter IPs to avoid blocks.
CAPTCHA bypass
Bright Data’s Unlocker handles bot detection automatically, including JS rendering.
Structured extraction
Platform-specific parsers for Amazon, LinkedIn, and more return clean JSON.
The langchain-brightdata package provides a BrightDataSERP tool that slots directly into any LangChain or LangGraph agent. Use this path when you want quick setup and standard web search.Best for: search-first workflows, rapid prototyping, Google or Bing SERP data.
The Bright Data MCP server exposes 60+ tools including platform-specific extractors for Amazon, LinkedIn, Instagram, and more, plus universal web scraping with JS rendering.Best for: structured data extraction from major platforms, advanced browser automation, production scraping pipelines.
agent = create_react_agent( model=llm, tools=[serp_tool], prompt=( "You are a web researcher agent with access to a SERP tool. " "You MUST use the tool to answer user queries. If no specific country, " "language, search engine, or vertical is specified, choose what best fits " "the user's question." ),)
user_query = "What are the latest developments and news in AI technology in the US?"for step in agent.stream( {"messages": [("human", user_query)]}, stream_mode="values",): step["messages"][-1].pretty_print()
The streaming output lets you observe the agent’s reasoning process: query analysis, tool invocation, result processing, and final synthesis.
research_query = """Please research the renewable energy market trends for 2024-2025.I need information about:1. Market growth predictions2. Leading companies and their strategies3. Recent technological breakthroughs4. Government policies affecting the sector"""for step in research_assistant.stream( {"messages": [("human", research_query)]}, stream_mode="values",): step["messages"][-1].pretty_print()
The MCP path gives you access to Bright Data’s full tool suite, including platform-specific extractors and browser automation. It requires Node.js to run the @brightdata/mcp package.
The npx @brightdata/mcp command downloads and runs the Bright Data MCP server. You need Node.js installed on the machine running the agent. The server exposes 60+ tools including search engines, platform-specific scrapers, and a universal web unlocker.
import datetimeasync def create_web_scraper_agent(): """Create a ReAct agent with full Bright Data MCP tool access.""" tools = await setup_bright_data_tools() current_date = datetime.datetime.now().strftime("%B %d, %Y") llm = ChatOpenAI( openai_api_key=os.getenv("OPENROUTER_API_KEY"), openai_api_base="https://openrouter.ai/api/v1", model_name="google/gemini-2.5-flash-lite-preview-06-17", temperature=0.1, ) agent = create_react_agent( model=llm, tools=tools, prompt=( f"You are a web data extraction specialist. Today is {current_date}. " f"You have access to {len(tools)} Bright Data tools including search engines, " "platform-specific extractors, and a universal web unlocker. " "Always use a tool to answer user requests—do not rely on training data. " "Follow this process: 1) Understand the request. 2) Select the best tool. " "3) Execute and review results. 4) Return a structured response with sources." ), ) return agent
async def test_basic_search(agent): print("Testing basic search...") print("=" * 50) result = await agent.ainvoke({ "messages": [("human", "Give me the latest AI news from this week. Include full URLs to sources.")], }) print("\nSearch results:") print(result["messages"][-1].content) return resultagent = await create_web_scraper_agent()basic_result = await test_basic_search(agent)
The ReAct agent follows a systematic decision loop for each query:
For competitive intelligence workflows, combine the SERP tool to discover relevant URLs with the universal scraper to extract full page content from the top results. The agent handles this chaining automatically when given a research-style prompt.
Monitor your Bright Data API usage. The free tier provides 5,000 unlocker requests per month. Each BrightDataSERP call with results_count=10 consumes one request. High-volume research agents can exhaust the free tier quickly.
Consider these patterns when moving to production:
Rate limiting: Add delays between agent runs that trigger many tool calls in rapid succession.
Result caching: Cache SERP results with a short TTL (minutes to hours) for queries that repeat across users.
Error handling: Wrap agent invocations in try/except to handle network failures from the proxy layer gracefully.
Monitoring: Log which tools the agent selects and how often to identify optimization opportunities.