Documentation Index
Fetch the complete documentation index at: https://mintlify.com/tobi/qmd/llms.txt
Use this file to discover all available pages before exploring further.
QMD is an on-device hybrid search engine combining BM25 full-text search, vector semantic search, and LLM re-ranking—all running locally via node-llama-cpp with GGUF models.
System Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ QMD Hybrid Search Pipeline │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────┐
│ User Query │
└────────┬────────┘
│
┌──────────────┴──────────────┐
▼ ▼
┌────────────────┐ ┌────────────────┐
│ Query Expansion│ │ Original Query│
│ (fine-tuned) │ │ (×2 weight) │
└───────┬────────┘ └───────┬────────┘
│ │
│ 2 alternative queries │
└──────────────┬──────────────┘
│
┌───────────────────────┼───────────────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Original Query │ │ Expanded Query 1│ │ Expanded Query 2│
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
┌───────┴───────┐ ┌───────┴───────┐ ┌───────┴───────┐
▼ ▼ ▼ ▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
│ BM25 │ │Vector │ │ BM25 │ │Vector │ │ BM25 │ │Vector │
│(FTS5) │ │Search │ │(FTS5) │ │Search │ │(FTS5) │ │Search │
└───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘
│ │ │ │ │ │
└───────┬───────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└────────────────────────┼───────────────────────┘
│
▼
┌───────────────────────┐
│ RRF Fusion + Bonus │
│ Original query: ×2 │
│ Top-rank bonus: +0.05│
│ Top 30 Kept │
└───────────┬───────────┘
│
▼
┌───────────────────────┐
│ LLM Re-ranking │
│ (qwen3-reranker) │
│ Yes/No + logprobs │
└───────────┬───────────┘
│
▼
┌───────────────────────┐
│ Position-Aware Blend │
│ Top 1-3: 75% RRF │
│ Top 4-10: 60% RRF │
│ Top 11+: 40% RRF │
└───────────────────────┘
Core Components
Storage Layer
QMD uses SQLite as its storage backend with two key extensions:
- FTS5 (full-text search) for BM25 keyword matching
- sqlite-vec for vector similarity search
Index stored in: ~/.cache/qmd/index.sqlite
Search Backends
| Backend | Raw Score | Conversion | Range |
|---|
| FTS (BM25) | SQLite FTS5 BM25 | Math.abs(score) | 0 to ~25+ |
| Vector | Cosine distance | 1 / (1 + distance) | 0.0 to 1.0 |
| Reranker | LLM 0-10 rating | score / 10 | 0.0 to 1.0 |
LLM Models
QMD uses three local GGUF models (auto-downloaded on first use):
| Model | Purpose | Size | URI |
|---|
| embeddinggemma-300M-Q8_0 | Vector embeddings | ~300MB | hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf |
| qwen3-reranker-0.6b-q8_0 | Re-ranking | ~640MB | hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf |
| qmd-query-expansion-1.7B-q4_k_m | Query expansion (fine-tuned) | ~1.1GB | hf:tobil/qmd-query-expansion-1.7B-gguf/qmd-query-expansion-1.7B-q4_k_m.gguf |
Models are downloaded from HuggingFace and cached in ~/.cache/qmd/models/.
Data Flow
Indexing Flow
Collection ──► Glob Pattern ──► Markdown Files ──► Parse Title ──► Hash Content
│ │ │
│ │ ▼
│ │ Generate docid
│ │ (6-char hash)
│ │ │
└──────────────────────────────────────────────────►└──► Store in SQLite
│
▼
FTS5 Index
Embedding Flow
Documents are chunked into ~900-token pieces with 15% overlap using smart boundary detection:
Document ──► Smart Chunk (~900 tokens) ──► Format each chunk ──► node-llama-cpp ──► Store Vectors
│ "title | text" embedBatch()
│
└─► Chunks stored with:
- hash: document hash
- seq: chunk sequence (0, 1, 2...)
- pos: character position in original
Search Modes
QMD provides three search modes:
| Mode | Description | Use Case |
|---|
search | BM25 full-text search only | Fast keyword search, exact term matching |
vsearch | Vector semantic search only | Conceptual similarity, synonyms |
query | Hybrid: FTS + Vector + Query Expansion + Re-ranking | Best quality, recommended for most searches |
Context System
QMD supports hierarchical context annotations that help LLMs understand document structure:
qmd context add qmd://notes "Personal notes and ideas"
qmd context add qmd://docs/api "API documentation"
Contexts are inherited hierarchically and included in search results, making them especially useful for agentic workflows.
Score Interpretation
| Score | Meaning |
|---|
| 0.8 - 1.0 | Highly relevant |
| 0.5 - 0.8 | Moderately relevant |
| 0.2 - 0.5 | Somewhat relevant |
| 0.0 - 0.2 | Low relevance |
All scores are normalized to [0, 1] range for consistency across different search backends.