Discovery agent: finding B2B leads with web search - Agentic Sales & Marketing

The discovery agent is the entry point of the sales pipeline. It converts your raw client requirements into a focused web search query, retrieves live results, and asks the LLM to extract 3-5 companies that are strong fits for your services. Each extracted lead is placed on the pending_leads queue, where the supervisor picks them up one at a time and routes them through the rest of the pipeline.

State inputs

client_requirements

string

required

A free-text description of the services you offer and the kinds of companies you want to reach. The agent truncates this to 1,000 characters before passing it to the LLM.

State outputs

pending_leads

object[]

required

A list of discovered companies. Each item contains company_name, industry, and reason.

Show lead object properties

company_name

string

The name of the discovered company.

industry

string

The industry the company operates in.

reason

string

One sentence explaining why this company is a good fit.

current_lead

null

Reset to null after discovery so the supervisor starts fresh.

company_name

null

Reset to null after discovery.

completed_leads

array

Initialized to an empty list [] to track processed leads across the run.

lead_research

null

Cleared to prevent stale data from a previous run.

icp_analysis

null

Cleared to prevent stale data from a previous run.

competitor_analysis

null

Cleared to prevent stale data from a previous run.

outreach_content

null

Cleared to prevent stale data from a previous run.

proposal_document

null

Cleared to prevent stale data from a previous run.

crm_update

null

Cleared to prevent stale data from a previous run.

What it does

1

Generate a search query

The agent sends client_requirements (up to 1,000 characters) to the LLM and asks it to produce a short 5–8 word search query. The LLM returns only the query string — no preamble, no quotes — and the agent strips any residual Query: prefix with a regex.

2

Search the web

The cleaned query is passed to web_search with max_results=10. This returns a list of web results — titles, URLs, and snippets — that the LLM will use as raw lead candidates.

3

Extract structured leads

The agent sends both client_requirements and the raw search results to the LLM, asking it to identify 3–5 companies from the results that Webanix Solutions could approach. The LLM responds with strict JSON. The agent parses the response, strips any accidental markdown fences, and loads the lead list. On any parse error, pending_leads is set to an empty list and the error is logged.

4

Initialize the pipeline state

The agent writes pending_leads to state alongside reset values for current_lead, company_name, completed_leads, and all downstream output fields. This guarantees the supervisor and later agents never see data from a previous run.

Tools used

web_search

Performs a live web search and returns up to 10 result objects containing title, URL, and snippet.

Output schema

The LLM is prompted to return strict JSON with no markdown and no preamble. The expected structure is:

{
  "leads": [
    {
      "company_name": "Acme Manufacturing Co.",
      "industry": "Manufacturing",
      "reason": "Rapidly scaling operations with no existing ERP integration."
    }
  ]
}

The agent strips ```json and ``` fences automatically before parsing, so minor LLM formatting deviations are handled gracefully.

Source code

import json
import re
from llm import llm
from tools.search_tool import web_search


def discovery_agent(state: dict) -> dict:
    """
    Finds potential leads and stores them as pending_leads.
    The supervisor will pop from this list one at a time.
    """
    client_requirements = state.get("client_requirements", "")

    # ── Step 1: Generate a clean search query ─────────────────────────────────
    query_response = llm.invoke(
        f"""Based on these requirements, generate a short 5-8 word search query
        to find companies that need these services:
        {client_requirements[:1000]}

        Return ONLY the search query string, no quotes, no preamble."""
    )
    search_query = query_response.content.strip().strip('"')
    search_query = re.sub(r'^Query:\s*', '', search_query, flags=re.IGNORECASE)

    print(f"[DISCOVERY AGENT] Searching for: {search_query}")

    # ── Step 2: Search ─────────────────────────────────────────────────────────
    search_results = web_search(search_query, max_results=10)

    # ── Step 3: Extract structured leads ──────────────────────────────────────
    response = llm.invoke(
        f"""You are a Lead Discovery Agent.

        Identify 3-5 companies from the search results that Webanix Solutions
        could approach based on the requirements below.

        Requirements:
        {client_requirements}

        Search Results:
        {json.dumps(search_results, indent=2)}

        Return STRICT JSON only — no markdown, no preamble:
        {{
            "leads": [
                {{
                    "company_name": "...",
                    "industry":     "...",
                    "reason":       "one sentence why they are a good fit"
                }}
            ]
        }}"""
    )

    try:
        content = response.content.strip()
        if content.startswith("```json"):
            content = content[7:-3].strip()
        elif content.startswith("```"):
            content = content[3:-3].strip()

        data          = json.loads(content)
        pending_leads = data.get("leads", [])

    except Exception as e:
        print(f"[DISCOVERY AGENT] Parse error: {e}")
        pending_leads = []

    print(f"[DISCOVERY AGENT] Found {len(pending_leads)} leads: "
          f"{[l.get('company_name') for l in pending_leads]}")

    return {
        # ── Feed the supervisor's queue ────────────────────────────────────────
        "pending_leads":       pending_leads,

        # ── Initialise tracking fields ─────────────────────────────────────────
        "current_lead":        None,
        "company_name":        None,
        "completed_leads":     [],

        # ── Clear any stale pipeline outputs ──────────────────────────────────
        "lead_research":       None,
        "icp_analysis":        None,
        "competitor_analysis": None,
        "outreach_content":    None,
        "proposal_document":   None,
        "crm_update":          None,
    }

SalesMarketingState: shared workflow state reference

Lead research agent: company and contact intelligence

Powered by Mintlify

Auto-generate your docs

State inputs
State outputs
What it does
Tools used
Output schema
Source code

Build docs developers (and LLMs) love

Get started for free Talk to us