What is MCP?
Model Context Protocol (MCP) is an open standard for connecting LLMs to external data sources. QMD’s MCP server exposes search and document retrieval as:- Tools - Functions the LLM can call (search, get documents, etc.)
- Resources - Documents accessible via
qmd://URIs - Instructions - Auto-injected context about your indexed content
Transport Options
QMD supports two MCP transport modes:Stdio (default)
Launched as a subprocess by each client. Simple but loads models on every connection.HTTP (recommended)
Long-lived server that keeps models loaded in VRAM. Much faster for repeated queries.HTTP Server Endpoints
The HTTP server exposes:POST /mcp- MCP Streamable HTTP (JSON responses, stateless)POST /query- Direct structured search (no MCP wrapper)GET /health- Liveness check with uptime
Direct Query Endpoint
Skip MCP protocol overhead for simple searches:Claude Desktop Setup
macOS Configuration
Edit~/Library/Application Support/Claude/claude_desktop_config.json:
- Stdio (simple)
- HTTP (recommended)
Stdio mode loads models on every Claude restart (~1-3s startup time).
Windows Configuration
Edit%APPDATA%\Claude\claude_desktop_config.json:
Linux Configuration
Edit~/.config/Claude/claude_desktop_config.json:
Restart Claude Desktop after editing the configuration.
Claude Code Setup
Using the Plugin (Recommended)
Install from the marketplace:Manual MCP Configuration
Edit~/.claude/settings.json:
MCP Tools
The QMD MCP server exposes these tools to LLMs:query - Structured Search
Search using typed sub-queries (lex/vec/hyde):
Parameters:
searches(required) - Array of{type: 'lex'|'vec'|'hyde', query: string}limit(optional) - Max results (default: 10)minScore(optional) - Min relevance 0-1 (default: 0)collections(optional) - Filter to specific collections
“Search for authentication patterns”LLM calls:
get - Retrieve Document
Get a single document by path or docid:
Parameters:
file(required) - Path or docid (e.g.,notes/meeting.md,#abc123,notes/meeting.md:100)fromLine(optional) - Start from line numbermaxLines(optional) - Maximum lines to returnlineNumbers(optional) - Add line numbers to output
multi_get - Retrieve Multiple Documents
Batch retrieve documents by glob or list:
Parameters:
pattern(required) - Glob pattern or comma-separated listmaxLines(optional) - Max lines per filemaxBytes(optional) - Skip files larger than this (default: 10KB)lineNumbers(optional) - Add line numbers
status - Index Status
Get information about the QMD index:
Example:
MCP Resources
Documents are exposed asqmd:// resources:
qmd://notes/meeting.mdqmd://docs/api-guide.md
Auto-Injected Instructions
When Claude connects to QMD’s MCP server, it receives instructions about your indexed content: Example:Performance: HTTP vs Stdio
Stdio Mode
Pros:- Simple configuration
- No background process
- Models load on every client connection (~1-3s startup)
- GPU memory released between sessions
HTTP Mode
Pros:- Models stay loaded in VRAM across requests
- First query: ~1-3s (model loading)
- Subsequent queries: ~200-500ms (cached)
- Embedding/reranking contexts disposed after 5 min idle
- Requires background daemon
- Uses VRAM even when idle
For best performance, use HTTP mode:
qmd mcp --http --daemonUsage Examples in Claude
Example 1: Ask a Question
You: “What are our authentication best practices?” Claude:-
Calls
querytool: - Gets results with docids
-
Calls
getto retrieve full documents: - Synthesizes answer from retrieved documents
Example 2: Find Related Documents
You: “Find all meeting notes from January 2025” Claude:-
Calls
multi_get: - Lists files and summarizes content
Example 3: Filter by Collection
You: “Search only work docs for API patterns” Claude:-
Calls
query: -
Returns results only from
docscollection
Troubleshooting
Claude doesn’t see QMD tools
-
Check config file location:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
- macOS:
- Verify JSON syntax is valid
- Restart Claude Desktop
- Check Claude’s MCP logs (usually in app settings)
HTTP server not responding
-
Check if daemon is running:
-
Test health endpoint:
-
Check logs:
-
Restart daemon:
Slow first query
This is normal. Models load on first use (~1-3s). Use HTTP mode to keep them loaded:Port already in use
Security Notes
- HTTP server binds to
localhostonly (not accessible from network) - No authentication required (local-only by design)
- Documents are read-only (MCP tools cannot modify files)