Search Modes: Quick, Deep, and Academic Search

Spy Search supports three distinct search modes, each accessible through dedicated API endpoints and the React frontend. Choose the mode that best fits your use case — from sub-second streaming answers to comprehensive multi-page research reports.

Quick Search

Quick Search is the default mode for fast, conversational queries. It retrieves the top results from DuckDuckGo, extracts page content, constructs a prompt, and streams the LLM response back to the client in real time. Endpoints:

POST /stream_completion/{query} — streams the response as text/plain chunks
POST /quick/{query} — returns a complete JSON response (non-streaming)

Triggering search in the streaming endpoint

The streaming endpoint supports both plain chat completions and search-augmented completions. To activate DuckDuckGo search, prefix your query with search::\

search: what is the Rust programming language

When the backend detects the search: prefix (via "search:" in query), it runs the search pipeline in parallel with model loading using asyncio.gather, then builds a prompt from the retrieved content before streaming the response. Without the prefix, the query is sent directly to the LLM without any web lookup.

The search: prefix is the only switch that controls whether DuckDuckGo search runs inside /stream_completion/{query}. The /quick/{query} endpoint always performs a search regardless of prefix.

Streaming example

curl -X POST http://localhost:8000/stream_completion/search:+what+is+rust+programming \
  -F 'messages=[{"role":"user","content":"what is rust"}]'

The response is a text/plain chunked stream. Each chunk is a raw text fragment that the frontend accumulates and renders progressively.

Non-streaming example

curl -X POST http://localhost:8000/quick/what+is+rust+programming \
  -F 'messages=[{"role":"user","content":"what is rust"}]'

Response shape:

{
  "report": "Rust is a systems programming language focused on safety...",
  "files_received": [],
  "messages_received": [{"role": "user", "content": "what is rust"}]
}

Performance target

The DuckDuckGo integration is tuned for a sub-1.5-second total search time. Key design decisions that make this possible:

Setting	Value	Purpose
Per-request timeout	400 ms	Prevents slow sites from blocking the pipeline
TCP connect timeout	100 ms	Fails fast on unreachable hosts
Socket read timeout	300 ms	Caps time spent reading response bodies
Content extraction budget	1.2 s	Hard deadline across all concurrent fetches
Max content read per URL	20 KB	Avoids downloading full pages
Paragraphs extracted	First 15	Enough for article summaries

If the total time exceeds 1.5 seconds, the engine returns an empty result set rather than delivering stale or incomplete data.

Deep Search (Report Generation)

Deep Search triggers the full multi-agent pipeline and produces a structured, ~2000-word research report with citations. Endpoint: POST /report/{query} The pipeline runs as follows:

Planner decomposes the query into subtasks
Searcher retrieves and summarises web content for each subtask
Reporter plans the report structure, then writes each section independently
Sections are concatenated into the final report and returned

curl -X POST "http://localhost:8000/report/future+of+AI+in+healthcare" \
  -F 'messages=[{"role":"user","content":"future of AI in healthcare"}]'

Deep Search is significantly slower than Quick Search because it makes multiple sequential LLM calls. See Report Generation for a full breakdown.

Academic Search

Academic Search is a specialised variant of Quick Search that automatically scopes results to arXiv.org. It prepends site:arxiv.org to your query before passing it to DuckDuckGo, so every result comes from an arXiv paper. Endpoint: POST /stream_completion_academic/{query} The response is a text/plain chunked stream, identical in format to the standard streaming endpoint.

Unlike /stream_completion/{query}, the academic endpoint always performs a DuckDuckGo search — no search: prefix is required or checked. The endpoint unconditionally prepends site:arxiv.org to whatever query you supply and runs the search pipeline.

Example

curl -X POST "http://localhost:8000/stream_completion_academic/transformer+attention+mechanisms" \
  -F 'messages=[{"role":"user","content":"transformer attention mechanisms"}]'

Suggested categories (frontend shortcuts)

The React frontend exposes one-click category buttons for:

Medical Research
Legal Studies
Computer Science
Physics
Literature

Each button pre-fills the query input with <category> recent research, which is then sent to /stream_completion_academic/.

How Streaming Works

All streaming endpoints return a StreamingResponse with media_type="text/plain". The response body is an async generator that yields text chunks as they are produced by the LLM. The frontend consumes these streams using the Fetch API’s ReadableStream:

const reader = response.body?.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const chunk = decoder.decode(value, { stream: true });
  accumulatedContent += chunk;
  setMessages(/* update UI */);
}

Duplicate chunks are deduplicated on the server side by hashing each chunk before yielding it. If no chunks are received at all, the server yields a fallback "No response generated" message.

Getting Started

Configuration

Core Features

Architecture

Contributing

Search Modes: Quick, Deep, and Academic Search

Quick Search

Triggering search in the streaming endpoint

Streaming example

Non-streaming example

Performance target

Deep Search (Report Generation)

Academic Search

Example

Suggested categories (frontend shortcuts)

How Streaming Works

Build docs developers (and LLMs) love

Getting Started

Configuration

Core Features

Architecture

Contributing

Documentation Index

​Quick Search

​Triggering search in the streaming endpoint

​Streaming example

​Non-streaming example

​Performance target

​Deep Search (Report Generation)

​Academic Search

​Example

​Suggested categories (frontend shortcuts)

​How Streaming Works

Build docs developers (and LLMs) love

Quick Search

Triggering search in the streaming endpoint

Streaming example

Non-streaming example

Performance target

Deep Search (Report Generation)

Academic Search

Example

Suggested categories (frontend shortcuts)

How Streaming Works