Query Expansion
Fine-Tuned Model
QMD uses a fine-tuned 1.7B parameter model for query expansion:- lex: Keyword variant (routes to BM25/FTS5)
- vec: Semantic variant (routes to vector search)
- hyde: Hypothetical document (routes to vector search)
Expansion Process
Strong Signal Detection
QMD skips expensive query expansion when BM25 finds a strong exact match:Search Execution
Type-Routed Search
Queries are routed to appropriate backends based on type:Batch Embedding Optimization
Vector queries are batch-embedded for efficiency:Reciprocal Rank Fusion (RRF)
Algorithm
RRF combines multiple ranked lists into a single ranking:Fusion Configuration
Why k=60?
The constantk=60 provides a good balance:
- Lower
k→ top ranks dominate (more weight on position) - Higher
k→ flatter distribution (more weight on presence) k=60is the empirically-validated default from RRF literature
Reranking
Chunk Selection
Reranking uses document chunks (not full bodies) to avoid O(tokens) performance trap:Reranker Model
QMD uses Qwen3-Reranker (cross-encoder architecture):Context Size Optimization
Reranker uses optimized context window:Position-Aware Blending
Motivation
Pure reranker scores can destroy high-confidence retrieval results. Position-aware blending preserves exact matches while trusting the reranker for semantic matches.Algorithm
Weight Breakdown
| RRF Rank | RRF Weight | Reranker Weight | Rationale |
|---|---|---|---|
| 1-3 | 75% | 25% | Preserve exact matches |
| 4-10 | 60% | 40% | Balanced trust |
| 11+ | 40% | 60% | Trust reranker for semantic matches |
Example
For a document at RRF rank 2 (high retrieval confidence):Pipeline Summary
- BM25 Probe → strong signal detection (skip expansion)
- Query Expansion → typed variants (lex/vec/hyde)
- Type-Routed Search → FTS for lex, vector for vec/hyde
- Batch Embedding → parallel embedding for efficiency
- RRF Fusion → combine results with weighted ranks
- Chunk Selection → keyword-best chunk per document
- Reranking → LLM scores on chunks (not full bodies)
- Position-Aware Blending → trust retrieval for top ranks, reranker for semantic matches
- Deduplication → one result per file
- Score Filtering → apply
minScorethreshold - Top-K Selection → slice to requested limit