The Local RAG feature lets you search your own documents alongside live web content. Instead of (or in addition to) querying the internet, Spy Search converts your files to Markdown, indexes them in a local ChromaDB vector store, retrieves the most relevant passages at query time, and passes them to the LLM as context.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/JasonHonKL/spy-search/llms.txt
Use this file to discover all available pages before exploring further.
How It Works
- Document conversion — Every file in your configured directory is converted to Markdown using MarkItDown. MarkItDown handles PDF, Word, Excel, PowerPoint, plain text, and Markdown natively, preserving structure and formatting as plain text.
-
Chunking — The Markdown text is indexed using a 1500-character boundary. The
_file_handlermethod iterates over every character and stores a snapshot of the accumulated text into ChromaDB at every 1500th character position. Each stored document includes metadata recording the source file path. -
Embedding and indexing — ChromaDB uses an Ollama embedding function (
nomic-embed-text:latestby default, served athttp://localhost:11434) to convert each chunk into a vector and persist it under./local_db. -
Retrieval at query time — When the
local-retrievalagent runs, it callsdb.query(task, 2)to fetch the top-2 most relevant chunks via vector similarity search. -
LLM synthesis — Each retrieved chunk is passed through a
retrieval_promptand sent to the LLM, which generates a structured summary. The summaries are appended to the shareddataarray and forwarded to the next agent in the pipeline (typically the Reporter).
The RAG agent calls
db.reset() at initialisation on every run. This clears and re-creates the ChromaDB collection, ensuring the index always reflects the current contents of your files directory rather than a stale snapshot.Enabling Local RAG
Add"local-retrieval" to the agents array in your config.json and set the db field to the directory containing your documents:
db key is absent, the agent defaults to ./local_files.
Managing Files via the API
Use the following endpoints to manage your document library without touching the filesystem directly.| Method | Endpoint | Description |
|---|---|---|
GET | /folder_list | List all available folders |
POST | /create_folder | Create a new folder |
GET | /select_folder?folder_name=<name> | Set the active folder for indexing |
POST | /upload_file | Upload a file (multipart form) |
POST | /delete_file | Delete a file by path |
Upload a file
Create a folder
Select a folder
Supported File Types
MarkItDown handles any format it can meaningfully convert to Markdown text. Confirmed supported types include:| File type | Extension(s) |
|---|---|
.pdf | |
| Word document | .docx, .doc |
| Excel spreadsheet | .xlsx, .xls |
| PowerPoint presentation | .pptx, .ppt |
| Plain text | .txt |
| Markdown | .md, .mdx |
ChromaDB Configuration
| Setting | Value |
|---|---|
| Database path | ./local_db |
| Embedding model | nomic-embed-text:latest |
| Embedding server | http://localhost:11434 (Ollama) |
| Collection name | local_search |
| Chunk boundary | Every 1500 characters |
| Results returned per query | 2 chunks |
VectorSearch class wraps ChromaDB’s PersistentClient with allow_reset=True, so the full database can be wiped and rebuilt in a single client.reset() call.
Example: Local RAG + Report Pipeline
./local_files/research_papers, retrieves the most relevant passages, and the Reporter agent writes a structured report grounded in your own files.