Generate Long-Form Research Reports with Spy Search

The Reporter agent is Spy Search’s core output mechanism for deep research. Rather than returning a single LLM response, it orchestrates a multi-step pipeline that gathers live web content, plans a structured outline, and writes each section with proper citations — producing a cohesive, long-form report.

How It Works

Planner decomposes the query

The Planner agent receives the raw query and breaks it into a set of focused subtasks — for example, splitting “future of AI in healthcare” into subtasks covering diagnostics, drug discovery, regulatory challenges, and patient outcomes.

Searcher retrieves web content

The Search_agent (or Quick_searcher) processes each subtask. It searches the web, crawls the resulting URLs, and generates structured summaries for every article — capturing a title, summary, brief_summary, keywords, and source url.

Reporter assigns random IDs to sources

Before writing, the Reporter agent calls its data_handler, which iterates over all gathered articles and assigns each one a random 4-character alphanumeric ID (e.g. aB3x). These IDs are used throughout the planning and writing steps to reference specific sources.

rand_id = "".join(secrets.choice(alphabet) for _ in range(self.length))  # length = 4

Reporter plans the report structure

The Reporter calls its _planner method, which sends the query and a list of short summaries (keyed by their random IDs) to the LLM. The LLM returns a JSON array describing each section of the report — the section task description and the list of source IDs it should draw from:

[
  {
    "task": "Introduction: What is AI in healthcare?",
    "data": ["aB3x", "Kp9q"],
    "content": ""
  },
  {
    "task": "AI in diagnostic imaging",
    "data": ["Kp9q", "mZ2r"],
    "content": ""
  }
]

Reporter writes each section independently

For every planned section, the _task_handler method fetches the full source objects matching the section’s data IDs, constructs a report_task prompt, and calls the LLM. Each LLM call returns a JSON object containing the section content (~400–500 words in Markdown) and a short_summary.The section content is appended sequentially to build the final report.

The Report Endpoint

curl -X POST "http://localhost:8000/report/future+of+AI+in+healthcare" \
  -F 'messages=[{"role":"user","content":"future of AI in healthcare"}]'

Response shape:

{
  "report": "## Introduction\n\nArtificial intelligence is transforming healthcare...",
  "files_received": [],
  "messages_received": [{"role": "user", "content": "future of AI in healthcare"}]
}

The report field contains the full Markdown-formatted report, ready to render. files_received lists any uploaded files (multipart) that were attached to the request. The frontend’s useStreamingChat hook selects the /report/ endpoint automatically when the user enables Deep Research mode, then awaits the full JSON response (rather than streaming chunks):

const endpoint = isDeepResearch ? 'report' : 'stream_completion';
// ...
const data = await response.json();
finalContent = data.report;

Source Handling

Each gathered article receives a random 4-character ID generated with secrets.choice over the set [A-Za-z0-9]. This ID is used in two places:

Planning prompt — the LLM sees a list of { id, short_summary } pairs and cites IDs in the section plan
Writing prompt — the Reporter resolves IDs back to full source objects (url, title, summary) before constructing the per-section prompt

This two-pass approach keeps the planning prompt short (only summaries) while giving the writing step access to full article content.

Recommended Agent Configuration

In config.json, set the agents array to control which agents participate in report generation:

{
  "agents": ["quick-searcher", "reporter"]
}

Configuration	Use case
`["reporter"]`	Reporter only — uses whatever data the Planner already has
`["quick-searcher", "reporter"]`	Fast web search + report writing (recommended)
`["searcher", "reporter"]`	Deep web crawl + report writing (slower, more thorough)
`["local-retrieval", "reporter"]`	Local document search + report writing

For quick factual answers, use POST /quick/{query} instead of POST /report/{query}. Quick Search returns a single LLM response from DuckDuckGo results in well under two seconds, whereas report generation takes considerably longer.

Report generation is intentionally slower than Quick Search. The pipeline makes at least two LLM calls (one for report planning, one per section), plus one or more web-search passes. A report covering five sections will therefore make approximately seven or more LLM calls in total.

Getting Started

Configuration

Core Features

Architecture

Contributing

Generate Long-Form Research Reports with Spy Search

How It Works

The Report Endpoint

Source Handling

Recommended Agent Configuration

Build docs developers (and LLMs) love

Getting Started

Configuration

Core Features

Architecture

Contributing

Documentation Index

​How It Works

​The Report Endpoint

​Source Handling

​Recommended Agent Configuration

Build docs developers (and LLMs) love

How It Works

The Report Endpoint

Source Handling

Recommended Agent Configuration