Skip to main content
QMD is an on-device hybrid search engine combining BM25 full-text search, vector semantic search, and LLM re-ranking—all running locally via node-llama-cpp with GGUF models.

System Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                         QMD Hybrid Search Pipeline                          │
└─────────────────────────────────────────────────────────────────────────────┘

                              ┌─────────────────┐
                              │   User Query    │
                              └────────┬────────┘

                        ┌──────────────┴──────────────┐
                        ▼                             ▼
               ┌────────────────┐            ┌────────────────┐
               │ Query Expansion│            │  Original Query│
               │  (fine-tuned)  │            │   (×2 weight)  │
               └───────┬────────┘            └───────┬────────┘
                       │                             │
                       │ 2 alternative queries       │
                       └──────────────┬──────────────┘

              ┌───────────────────────┼───────────────────────┐
              ▼                       ▼                       ▼
     ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
     │ Original Query  │     │ Expanded Query 1│     │ Expanded Query 2│
     └────────┬────────┘     └────────┬────────┘     └────────┬────────┘
              │                       │                       │
      ┌───────┴───────┐       ┌───────┴───────┐       ┌───────┴───────┐
      ▼               ▼       ▼               ▼       ▼               ▼
  ┌───────┐       ┌───────┐ ┌───────┐     ┌───────┐ ┌───────┐     ┌───────┐
  │ BM25  │       │Vector │ │ BM25  │     │Vector │ │ BM25  │     │Vector │
  │(FTS5) │       │Search │ │(FTS5) │     │Search │ │(FTS5) │     │Search │
  └───┬───┘       └───┬───┘ └───┬───┘     └───┬───┘ └───┬───┘     └───┬───┘
      │               │         │             │         │             │
      └───────┬───────┘         └──────┬──────┘         └──────┬──────┘
              │                        │                       │
              └────────────────────────┼───────────────────────┘


                          ┌───────────────────────┐
                          │   RRF Fusion + Bonus  │
                          │  Original query: ×2   │
                          │  Top-rank bonus: +0.05│
                          │     Top 30 Kept       │
                          └───────────┬───────────┘


                          ┌───────────────────────┐
                          │    LLM Re-ranking     │
                          │  (qwen3-reranker)     │
                          │  Yes/No + logprobs    │
                          └───────────┬───────────┘


                          ┌───────────────────────┐
                          │  Position-Aware Blend │
                          │  Top 1-3:  75% RRF    │
                          │  Top 4-10: 60% RRF    │
                          │  Top 11+:  40% RRF    │
                          └───────────────────────┘

Core Components

Storage Layer

QMD uses SQLite as its storage backend with two key extensions:
  • FTS5 (full-text search) for BM25 keyword matching
  • sqlite-vec for vector similarity search
Index stored in: ~/.cache/qmd/index.sqlite

Search Backends

BackendRaw ScoreConversionRange
FTS (BM25)SQLite FTS5 BM25Math.abs(score)0 to ~25+
VectorCosine distance1 / (1 + distance)0.0 to 1.0
RerankerLLM 0-10 ratingscore / 100.0 to 1.0

LLM Models

QMD uses three local GGUF models (auto-downloaded on first use):
ModelPurposeSizeURI
embeddinggemma-300M-Q8_0Vector embeddings~300MBhf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf
qwen3-reranker-0.6b-q8_0Re-ranking~640MBhf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf
qmd-query-expansion-1.7B-q4_k_mQuery expansion (fine-tuned)~1.1GBhf:tobil/qmd-query-expansion-1.7B-gguf/qmd-query-expansion-1.7B-q4_k_m.gguf
Models are downloaded from HuggingFace and cached in ~/.cache/qmd/models/.

Data Flow

Indexing Flow

Collection ──► Glob Pattern ──► Markdown Files ──► Parse Title ──► Hash Content
    │                                                   │              │
    │                                                   │              ▼
    │                                                   │         Generate docid
    │                                                   │         (6-char hash)
    │                                                   │              │
    └──────────────────────────────────────────────────►└──► Store in SQLite


                                                                  FTS5 Index

Embedding Flow

Documents are chunked into ~900-token pieces with 15% overlap using smart boundary detection:
Document ──► Smart Chunk (~900 tokens) ──► Format each chunk ──► node-llama-cpp ──► Store Vectors
                │                           "title | text"        embedBatch()

                └─► Chunks stored with:
                    - hash: document hash
                    - seq: chunk sequence (0, 1, 2...)
                    - pos: character position in original

Search Modes

QMD provides three search modes:
ModeDescriptionUse Case
searchBM25 full-text search onlyFast keyword search, exact term matching
vsearchVector semantic search onlyConceptual similarity, synonyms
queryHybrid: FTS + Vector + Query Expansion + Re-rankingBest quality, recommended for most searches

Context System

QMD supports hierarchical context annotations that help LLMs understand document structure:
qmd context add qmd://notes "Personal notes and ideas"
qmd context add qmd://docs/api "API documentation"
Contexts are inherited hierarchically and included in search results, making them especially useful for agentic workflows.

Score Interpretation

ScoreMeaning
0.8 - 1.0Highly relevant
0.5 - 0.8Moderately relevant
0.2 - 0.5Somewhat relevant
0.0 - 0.2Low relevance
All scores are normalized to [0, 1] range for consistency across different search backends.

Build docs developers (and LLMs) love