Spy Search supports three distinct search modes, each accessible through dedicated API endpoints and the React frontend. Choose the mode that best fits your use case — from sub-second streaming answers to comprehensive multi-page research reports.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/JasonHonKL/spy-search/llms.txt
Use this file to discover all available pages before exploring further.
Quick Search
Quick Search is the default mode for fast, conversational queries. It retrieves the top results from DuckDuckGo, extracts page content, constructs a prompt, and streams the LLM response back to the client in real time. Endpoints:POST /stream_completion/{query}— streams the response astext/plainchunksPOST /quick/{query}— returns a complete JSON response (non-streaming)
Triggering search in the streaming endpoint
The streaming endpoint supports both plain chat completions and search-augmented completions. To activate DuckDuckGo search, prefix your query withsearch::\
search: prefix (via "search:" in query), it runs the search pipeline in parallel with model loading using asyncio.gather, then builds a prompt from the retrieved content before streaming the response. Without the prefix, the query is sent directly to the LLM without any web lookup.
The
search: prefix is the only switch that controls whether DuckDuckGo search runs inside /stream_completion/{query}. The /quick/{query} endpoint always performs a search regardless of prefix.Streaming example
text/plain chunked stream. Each chunk is a raw text fragment that the frontend accumulates and renders progressively.
Non-streaming example
Performance target
The DuckDuckGo integration is tuned for a sub-1.5-second total search time. Key design decisions that make this possible:| Setting | Value | Purpose |
|---|---|---|
| Per-request timeout | 400 ms | Prevents slow sites from blocking the pipeline |
| TCP connect timeout | 100 ms | Fails fast on unreachable hosts |
| Socket read timeout | 300 ms | Caps time spent reading response bodies |
| Content extraction budget | 1.2 s | Hard deadline across all concurrent fetches |
| Max content read per URL | 20 KB | Avoids downloading full pages |
| Paragraphs extracted | First 15 | Enough for article summaries |
Deep Search (Report Generation)
Deep Search triggers the full multi-agent pipeline and produces a structured, ~2000-word research report with citations. Endpoint:POST /report/{query}
The pipeline runs as follows:
- Planner decomposes the query into subtasks
- Searcher retrieves and summarises web content for each subtask
- Reporter plans the report structure, then writes each section independently
- Sections are concatenated into the final report and returned
Academic Search
Academic Search is a specialised variant of Quick Search that automatically scopes results to arXiv.org. It prependssite:arxiv.org to your query before passing it to DuckDuckGo, so every result comes from an arXiv paper.
Endpoint: POST /stream_completion_academic/{query}
The response is a text/plain chunked stream, identical in format to the standard streaming endpoint.
Unlike
/stream_completion/{query}, the academic endpoint always performs a DuckDuckGo search — no search: prefix is required or checked. The endpoint unconditionally prepends site:arxiv.org to whatever query you supply and runs the search pipeline.Example
Suggested categories (frontend shortcuts)
The React frontend exposes one-click category buttons for:- Medical Research
- Legal Studies
- Computer Science
- Physics
- Literature
<category> recent research, which is then sent to /stream_completion_academic/.
How Streaming Works
All streaming endpoints return aStreamingResponse with media_type="text/plain". The response body is an async generator that yields text chunks as they are produced by the LLM.
The frontend consumes these streams using the Fetch API’s ReadableStream:
"No response generated" message.