Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/JasonHonKL/spy-search/llms.txt

Use this file to discover all available pages before exploring further.

Spy Search supports three distinct search modes, each accessible through dedicated API endpoints and the React frontend. Choose the mode that best fits your use case — from sub-second streaming answers to comprehensive multi-page research reports. Quick Search is the default mode for fast, conversational queries. It retrieves the top results from DuckDuckGo, extracts page content, constructs a prompt, and streams the LLM response back to the client in real time. Endpoints:
  • POST /stream_completion/{query} — streams the response as text/plain chunks
  • POST /quick/{query} — returns a complete JSON response (non-streaming)

Triggering search in the streaming endpoint

The streaming endpoint supports both plain chat completions and search-augmented completions. To activate DuckDuckGo search, prefix your query with search::\
search: what is the Rust programming language
When the backend detects the search: prefix (via "search:" in query), it runs the search pipeline in parallel with model loading using asyncio.gather, then builds a prompt from the retrieved content before streaming the response. Without the prefix, the query is sent directly to the LLM without any web lookup.
The search: prefix is the only switch that controls whether DuckDuckGo search runs inside /stream_completion/{query}. The /quick/{query} endpoint always performs a search regardless of prefix.

Streaming example

curl -X POST http://localhost:8000/stream_completion/search:+what+is+rust+programming \
  -F 'messages=[{"role":"user","content":"what is rust"}]'
The response is a text/plain chunked stream. Each chunk is a raw text fragment that the frontend accumulates and renders progressively.

Non-streaming example

curl -X POST http://localhost:8000/quick/what+is+rust+programming \
  -F 'messages=[{"role":"user","content":"what is rust"}]'
Response shape:
{
  "report": "Rust is a systems programming language focused on safety...",
  "files_received": [],
  "messages_received": [{"role": "user", "content": "what is rust"}]
}

Performance target

The DuckDuckGo integration is tuned for a sub-1.5-second total search time. Key design decisions that make this possible:
SettingValuePurpose
Per-request timeout400 msPrevents slow sites from blocking the pipeline
TCP connect timeout100 msFails fast on unreachable hosts
Socket read timeout300 msCaps time spent reading response bodies
Content extraction budget1.2 sHard deadline across all concurrent fetches
Max content read per URL20 KBAvoids downloading full pages
Paragraphs extractedFirst 15Enough for article summaries
If the total time exceeds 1.5 seconds, the engine returns an empty result set rather than delivering stale or incomplete data.

Deep Search (Report Generation)

Deep Search triggers the full multi-agent pipeline and produces a structured, ~2000-word research report with citations. Endpoint: POST /report/{query} The pipeline runs as follows:
  1. Planner decomposes the query into subtasks
  2. Searcher retrieves and summarises web content for each subtask
  3. Reporter plans the report structure, then writes each section independently
  4. Sections are concatenated into the final report and returned
curl -X POST "http://localhost:8000/report/future+of+AI+in+healthcare" \
  -F 'messages=[{"role":"user","content":"future of AI in healthcare"}]'
Deep Search is significantly slower than Quick Search because it makes multiple sequential LLM calls. See Report Generation for a full breakdown.
Academic Search is a specialised variant of Quick Search that automatically scopes results to arXiv.org. It prepends site:arxiv.org to your query before passing it to DuckDuckGo, so every result comes from an arXiv paper. Endpoint: POST /stream_completion_academic/{query} The response is a text/plain chunked stream, identical in format to the standard streaming endpoint.
Unlike /stream_completion/{query}, the academic endpoint always performs a DuckDuckGo search — no search: prefix is required or checked. The endpoint unconditionally prepends site:arxiv.org to whatever query you supply and runs the search pipeline.

Example

curl -X POST "http://localhost:8000/stream_completion_academic/transformer+attention+mechanisms" \
  -F 'messages=[{"role":"user","content":"transformer attention mechanisms"}]'

Suggested categories (frontend shortcuts)

The React frontend exposes one-click category buttons for:
  • Medical Research
  • Legal Studies
  • Computer Science
  • Physics
  • Literature
Each button pre-fills the query input with <category> recent research, which is then sent to /stream_completion_academic/.

How Streaming Works

All streaming endpoints return a StreamingResponse with media_type="text/plain". The response body is an async generator that yields text chunks as they are produced by the LLM. The frontend consumes these streams using the Fetch API’s ReadableStream:
const reader = response.body?.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const chunk = decoder.decode(value, { stream: true });
  accumulatedContent += chunk;
  setMessages(/* update UI */);
}
Duplicate chunks are deduplicated on the server side by hashing each chunk before yielding it. If no chunks are received at all, the server yields a fallback "No response generated" message.

Build docs developers (and LLMs) love