Architecture Overview

QMD is an on-device hybrid search engine combining BM25 full-text search, vector semantic search, and LLM re-ranking—all running locally via node-llama-cpp with GGUF models.

System Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                         QMD Hybrid Search Pipeline                          │
└─────────────────────────────────────────────────────────────────────────────┘

                              ┌─────────────────┐
                              │   User Query    │
                              └────────┬────────┘
                                       │
                        ┌──────────────┴──────────────┐
                        ▼                             ▼
               ┌────────────────┐            ┌────────────────┐
               │ Query Expansion│            │  Original Query│
               │  (fine-tuned)  │            │   (×2 weight)  │
               └───────┬────────┘            └───────┬────────┘
                       │                             │
                       │ 2 alternative queries       │
                       └──────────────┬──────────────┘
                                      │
              ┌───────────────────────┼───────────────────────┐
              ▼                       ▼                       ▼
     ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
     │ Original Query  │     │ Expanded Query 1│     │ Expanded Query 2│
     └────────┬────────┘     └────────┬────────┘     └────────┬────────┘
              │                       │                       │
      ┌───────┴───────┐       ┌───────┴───────┐       ┌───────┴───────┐
      ▼               ▼       ▼               ▼       ▼               ▼
  ┌───────┐       ┌───────┐ ┌───────┐     ┌───────┐ ┌───────┐     ┌───────┐
  │ BM25  │       │Vector │ │ BM25  │     │Vector │ │ BM25  │     │Vector │
  │(FTS5) │       │Search │ │(FTS5) │     │Search │ │(FTS5) │     │Search │
  └───┬───┘       └───┬───┘ └───┬───┘     └───┬───┘ └───┬───┘     └───┬───┘
      │               │         │             │         │             │
      └───────┬───────┘         └──────┬──────┘         └──────┬──────┘
              │                        │                       │
              └────────────────────────┼───────────────────────┘
                                       │
                                       ▼
                          ┌───────────────────────┐
                          │   RRF Fusion + Bonus  │
                          │  Original query: ×2   │
                          │  Top-rank bonus: +0.05│
                          │     Top 30 Kept       │
                          └───────────┬───────────┘
                                      │
                                      ▼
                          ┌───────────────────────┐
                          │    LLM Re-ranking     │
                          │  (qwen3-reranker)     │
                          │  Yes/No + logprobs    │
                          └───────────┬───────────┘
                                      │
                                      ▼
                          ┌───────────────────────┐
                          │  Position-Aware Blend │
                          │  Top 1-3:  75% RRF    │
                          │  Top 4-10: 60% RRF    │
                          │  Top 11+:  40% RRF    │
                          └───────────────────────┘

Core Components

Storage Layer

QMD uses SQLite as its storage backend with two key extensions:

FTS5 (full-text search) for BM25 keyword matching
sqlite-vec for vector similarity search

Index stored in: ~/.cache/qmd/index.sqlite

Search Backends

Backend	Raw Score	Conversion	Range
FTS (BM25)	SQLite FTS5 BM25	`Math.abs(score)`	0 to ~25+
Vector	Cosine distance	`1 / (1 + distance)`	0.0 to 1.0
Reranker	LLM 0-10 rating	`score / 10`	0.0 to 1.0

LLM Models

QMD uses three local GGUF models (auto-downloaded on first use):

Model	Purpose	Size	URI
embeddinggemma-300M-Q8_0	Vector embeddings	~300MB	`hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf`
qwen3-reranker-0.6b-q8_0	Re-ranking	~640MB	`hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf`
qmd-query-expansion-1.7B-q4_k_m	Query expansion (fine-tuned)	~1.1GB	`hf:tobil/qmd-query-expansion-1.7B-gguf/qmd-query-expansion-1.7B-q4_k_m.gguf`

Models are downloaded from HuggingFace and cached in ~/.cache/qmd/models/.

Data Flow

Indexing Flow

Collection ──► Glob Pattern ──► Markdown Files ──► Parse Title ──► Hash Content
    │                                                   │              │
    │                                                   │              ▼
    │                                                   │         Generate docid
    │                                                   │         (6-char hash)
    │                                                   │              │
    └──────────────────────────────────────────────────►└──► Store in SQLite
                                                                       │
                                                                       ▼
                                                                  FTS5 Index

Embedding Flow

Documents are chunked into ~900-token pieces with 15% overlap using smart boundary detection:

Document ──► Smart Chunk (~900 tokens) ──► Format each chunk ──► node-llama-cpp ──► Store Vectors
                │                           "title | text"        embedBatch()
                │
                └─► Chunks stored with:
                    - hash: document hash
                    - seq: chunk sequence (0, 1, 2...)
                    - pos: character position in original

Search Modes

QMD provides three search modes:

Mode	Description	Use Case
`search`	BM25 full-text search only	Fast keyword search, exact term matching
`vsearch`	Vector semantic search only	Conceptual similarity, synonyms
`query`	Hybrid: FTS + Vector + Query Expansion + Re-ranking	Best quality, recommended for most searches

Context System

QMD supports hierarchical context annotations that help LLMs understand document structure:

qmd context add qmd://notes "Personal notes and ideas"
qmd context add qmd://docs/api "API documentation"

Contexts are inherited hierarchically and included in search results, making them especially useful for agentic workflows.

Score Interpretation

Score	Meaning
0.8 - 1.0	Highly relevant
0.5 - 0.8	Moderately relevant
0.2 - 0.5	Somewhat relevant
0.0 - 0.2	Low relevance

All scores are normalized to [0, 1] range for consistency across different search backends.

Get Started

Core Concepts

Usage Guides

Architecture

System Architecture

Core Components

Storage Layer

Search Backends

LLM Models

Data Flow

Indexing Flow

Embedding Flow

Search Modes

Context System

Score Interpretation

Build docs developers (and LLMs) love

Get Started

Core Concepts

Usage Guides

Architecture

Documentation Index

​System Architecture

​Core Components

​Storage Layer

​Search Backends

​LLM Models

​Data Flow

​Indexing Flow

​Embedding Flow

​Search Modes

​Context System

​Score Interpretation

Build docs developers (and LLMs) love

System Architecture

Core Components

Storage Layer

Search Backends

LLM Models

Data Flow

Indexing Flow

Embedding Flow

Search Modes

Context System

Score Interpretation