Spy Search Architecture: How the System Works

Spy Search is built on a two-tier architecture: a React frontend that provides the user interface, and a FastAPI backend that houses the entire agentic search pipeline. When a query arrives at the backend, it moves through a chain of coordinated agents — Planner, Searcher, and Reporter — each fulfilling a discrete role before the final synthesized report is returned to the client.

Component Breakdown

FastAPI Backend (`main.py`)

The entry point of the application is main.py, which creates the FastAPI app instance, mounts the aggregated API router from src/api/app.py, and applies CORS middleware scoped to http://localhost:8080 — the port served by the React dev server and production build.

app = FastAPI(title="Your API", description="API Documentation", version="1.0.0")
app.include_router(router)

app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://localhost:8080"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

src/api/app.py aggregates five route modules — files, messages, agents, streaming, and misc — into a single APIRouter that is mounted on the app.

React Frontend (`frontend/`)

The frontend is a TypeScript + Vite application built with the shadcn/ui component library. It communicates with the FastAPI backend over HTTP and Server-Sent Events (SSE) for streaming responses.

Agent Pipeline

The search pipeline is composed of four layers working in sequence:

Router — wraps each agent and handles message passing to and from the Server
Server — holds the registry of all Routers, dispatches messages, and drives the pipeline loop
Planner — breaks the user query into an ordered task queue and assigns each task to the correct agent
Agents — execute domain-specific workflows (web search, RAG retrieval, report writing)
Reporter — collects all gathered data and synthesizes the final structured report

Browser Engine

The web retrieval layer offers two modes:

DuckDuckGo (src/browser/duckduckgo.py) — fast keyword search via langchain_community, with async content extraction capped at a 1.5-second total budget
crawl4ai / Playwright (src/browser/crawl_ai.py) — optional deep-crawl mode using a headless Chromium browser for JavaScript-rendered pages and LLM-guided content extraction

Vector Store

src/RAG/chrome.py wraps ChromaDB for local retrieval-augmented generation (RAG). The RAG_agent walks a configured file directory, converts each document to Markdown via markitdown, chunks text into 1 500-character segments, and indexes them in ChromaDB. At query time it retrieves the top-k most relevant chunks.

Model Layer

src/model/model.py defines the abstract Model base class with a unified interface (completion, completion_stream, get_llm_config, etc.). Concrete implementations for Gemini, Ollama, Deepseek, Grok, and OpenAI all satisfy this contract, so any agent can switch providers without code changes. The Factory class instantiates the correct implementation at runtime based on configuration.

Request Flow

User sends a query to a FastAPI endpoint (e.g., /report/{query} or a streaming endpoint).
Factory.get_model() instantiates the configured Model; Factory.get_agent() constructs the required Agent objects (Planner, one or more searchers, Reporter).
generate_report(query, planner, agents) wires each agent into a Router, registers all Routers with a Server, and calls server.start(query).
Planner receives the query, calls the LLM with a planning prompt that lists available agents and their descriptions, and returns an ordered task queue.
Server dispatches each task by following the "agent" field in every response — routing to quick-searcher, searcher, or local-retrieval as instructed.
Each agent executes its workflow (DuckDuckGo search, Playwright crawl, or ChromaDB retrieval) and returns {"agent": "planner", "task": "", "data": [...]} so the Server routes back to the Planner.
Planner pops the next task from its queue; once the queue is empty it routes to Reporter.
Reporter synthesizes a section-by-section report using targeted LLM calls, concatenates the sections, and returns {"agent": "TERMINATE", "data": <report>, "task": ""}.
Server detects TERMINATE and returns the final response dict to generate_report(), which extracts the "data" field (the finished report string) and returns it to the calling API handler.

Key Source Modules

Module	Path	Role
API App	`src/api/app.py`	FastAPI router aggregation across five route modules
Factory	`src/factory/factory.py`	Agent and Model instantiation by name/provider
Server	`src/router/server.py`	Agent pipeline orchestration and termination detection
Router	`src/router/router.py`	Per-agent message routing between Server and Agent
DuckSearch	`src/browser/duckduckgo.py`	DuckDuckGo-backed web search engine
VectorSearch	`src/RAG/chrome.py`	ChromaDB wrapper for local RAG

Server and Router are internal coordination primitives, not HTTP routers. They have no relationship to FastAPI’s APIRouter — they exist solely to pass messages between agents inside the pipeline.

Getting Started

Configuration

Core Features

Architecture

Contributing

Spy Search Architecture: How the System Works

Component Breakdown

FastAPI Backend (`main.py`)

React Frontend (`frontend/`)

Agent Pipeline

Browser Engine

Vector Store

Model Layer

Request Flow

Key Source Modules

Build docs developers (and LLMs) love

Getting Started

Configuration

Core Features

Architecture

Contributing

Documentation Index

​Component Breakdown

​FastAPI Backend (main.py)

​React Frontend (frontend/)

​Agent Pipeline

​Browser Engine

​Vector Store

​Model Layer

​Request Flow

​Key Source Modules

Build docs developers (and LLMs) love

Component Breakdown

FastAPI Backend (`main.py`)

React Frontend (`frontend/`)

Agent Pipeline

Browser Engine

Vector Store

Model Layer

Request Flow

Key Source Modules