Search Modes

QMD provides three search modes, each with different tradeoffs between speed, quality, and resource usage:

Mode	Speed	Quality	Use Case
search	Fast (~10ms)	Good	Keyword matching, exact terms
vsearch	Medium (~100ms)	Better	Semantic similarity, questions
query	Slow (~1-2s)	Best	High-quality results, LLM integration

search - BM25 Keyword Search

Fast full-text search using SQLite FTS5 with BM25 ranking. No LLM or embedding lookup required.

qmd search "authentication flow"

How It Works

Parses query into FTS5 syntax
Searches documents_fts table (full-text index)
Ranks results using BM25 algorithm
Returns top N results (default: 5)

Performance

Speed: ~10-50ms
No GPU required
No embeddings required

When to Use

Keyword matching

When you know specific terms that appear in documents:

qmd search "OAuth 2.0"
qmd search "CAP theorem"

Exact phrase matching

Use quotes for exact phrases:

qmd search '"machine learning"'
qmd search '"error handling best practices"'

Exclusions

Exclude terms with -:

qmd search "auth -oauth -saml"
qmd search '"machine learning" -"deep learning"'

Fast preliminary search

When you need instant results and semantic understanding isn’t critical.

Lex Query Syntax

The search command uses lex (lexical) query syntax:

Syntax	Meaning	Example
`word`	Prefix match	`perf` matches “performance”
`"phrase"`	Exact phrase	`"rate limiter"`
`-word`	Exclude term	`-sports`
`-"phrase"`	Exclude phrase	`-"test data"`

qmd search "auth"  # Matches: auth, authentication, authorize

Limitations

No semantic understanding (“auth” won’t match “login”)
Requires knowing exact terminology
Prefix matching can return false positives

vsearch - Vector Semantic Search

Semantic similarity search using embeddings and cosine distance. Better at understanding intent than keyword search.

qmd vsearch "how does authentication work"

How It Works

Embeds query using embeddinggemma-300M model
Searches vectors_vec table (vector index)
Computes cosine distance to all document chunks
Returns top N results (default: 5)

Performance

Speed: ~50-200ms (depends on index size and GPU)
Requires: Embeddings (qmd embed)
GPU: Accelerates embedding generation

When to Use

Natural language questions

When searching with questions or descriptions:

qmd vsearch "how to handle rate limiting"
qmd vsearch "what is the difference between consistency and availability"

Semantic similarity

When you don’t know exact terms but understand the concept:

qmd vsearch "authentication"  # Finds: login, OAuth, SSO, credentials
qmd vsearch "distributed systems"  # Finds: CAP theorem, consensus, replication

Cross-lingual understanding

Embeddings can match concepts across different phrasings:

qmd vsearch "user login"  # Matches: authentication, sign-in, credentials

Fast semantic search

When you need semantic understanding but don’t need reranking or query expansion.

Vec Query Format

Vector queries are plain natural language — no special syntax:

qmd vsearch "how does the rate limiter handle burst traffic"
qmd vsearch "what is the tradeoff between consistency and availability"

Vector queries do not support lex syntax like -term or "exact phrase". Use search for exclusions.

Limitations

Requires running qmd embed first
No query expansion (single embedding only)
No LLM reranking (may miss nuance)
Results not optimized for best-first ordering

query - Hybrid Search with Reranking

Highest quality search combining BM25, vector search, query expansion, and LLM reranking. Recommended for most use cases.

qmd query "user authentication process"

How It Works

Query ──► LLM Expansion ──► [Original, Variant 1]
              │
    ┌─────────┴─────────┐
    ▼                   ▼
 BM25 Search      Vector Search
    │                   │
    └─────────┬─────────┘
              ▼
    RRF Fusion (k=60)
    Original query ×2 weight
    Top-rank bonus: +0.05
              │
              ▼
    Top 30 candidates
              │
              ▼
    LLM Re-ranking
    (qwen3-reranker)
              │
              ▼
  Position-Aware Blend
  Rank 1-3:  75% RRF / 25% reranker
  Rank 4-10: 60% RRF / 40% reranker
  Rank 11+:  40% RRF / 60% reranker
              │
              ▼
    Final Results

Pipeline Stages

Query Expansion

LLM generates alternative query phrasings using qmd-query-expansion-1.7B model.Original query gets 2× weight in fusion to preserve exact matches.

Parallel Retrieval

Each query variation searches:

FTS (BM25): Keyword matching
Vector: Semantic similarity

RRF Fusion

Combines all results using Reciprocal Rank Fusion:

score = Σ(1 / (k + rank + 1))  where k=60

Top-ranked documents get bonus:

Rank #1: +0.05
Rank #2-3: +0.02

Candidate Selection

Top 30 candidates proceed to reranking (balances quality vs latency).

LLM Reranking

qwen3-reranker-0.6b scores each document for relevance (0.0-1.0).

Position-Aware Blending

Blends retrieval and reranker scores based on position:

Rank	Retrieval Weight	Reranker Weight
1-3	75%	25%
4-10	60%	40%
11+	40%	60%

This prevents the reranker from demoting strong exact matches.

Performance

Speed: ~500ms-2s (depends on GPU and query complexity)
Requires: Embeddings (qmd embed) and GPU for best performance
Models: 3 GGUF models (~2GB total)

When to Use

Best-quality results

When you need the most relevant results and can afford the latency:

qmd query "how does the authentication system work"

LLM integration

When feeding results to an LLM (via MCP or CLI):

qmd query "API design patterns" --json

Complex questions

When queries need understanding and expansion:

qmd query "what are the tradeoffs between consistency and availability"

Recall-critical searches

When missing relevant documents is worse than higher latency.

Score Interpretation

Score	Meaning
0.8 - 1.0	Highly relevant
0.5 - 0.8	Moderately relevant
0.2 - 0.5	Somewhat relevant
0.0 - 0.2	Low relevance

Use --min-score to filter results:

qmd query "error handling" --all --min-score 0.4

Choosing the Right Mode

search

Use when:

You know exact keywords
Speed is critical (<50ms)
No embeddings available

Example: qmd search "OAuth 2.0"

vsearch

Use when:

Natural language queries
Semantic understanding needed
Fast results (<200ms)

Example: qmd vsearch "how to authenticate"

query

Use when:

Best quality required
LLM integration
Complex questions

Example: qmd query "authentication best practices"

Common Options

All search modes support these options:

-n <num>           # Number of results (default: 5, or 20 for --files/--json)
-c, --collection   # Filter to specific collection
--all              # Return all matches (use with --min-score)
--min-score <num>  # Minimum score threshold (default: 0)
--full             # Show full document content
--line-numbers     # Add line numbers to output
--json             # JSON output
--files            # File list format (docid,score,filepath,context)

qmd search "API" -c docs
qmd vsearch "authentication" --collection notes
qmd query "error handling" -c docs

Advanced: Structured Queries

For maximum control, use Query Syntax to specify multiple query types:

qmd query $'lex: rate limiter algorithm\nvec: how does rate limiting work\nhyde: The API uses a token bucket algorithm...'

See Query Syntax for details.

Query Syntax - Advanced query document format
Embeddings - How vector search works
MCP Server - Using search modes via MCP
CLI Reference - Search command documentation

Get Started

Core Concepts

Usage Guides

Architecture

search - BM25 Keyword Search

How It Works

Performance

When to Use

Lex Query Syntax

Limitations

vsearch - Vector Semantic Search

How It Works

Performance

When to Use

Vec Query Format

Limitations

query - Hybrid Search with Reranking

How It Works

Pipeline Stages

Performance

When to Use

Score Interpretation

Choosing the Right Mode

search

vsearch

query

Common Options

Advanced: Structured Queries

Build docs developers (and LLMs) love

Get Started

Core Concepts

Usage Guides

Architecture

Documentation Index

​search - BM25 Keyword Search

​How It Works

​Performance

​When to Use

​Lex Query Syntax

​Limitations

​vsearch - Vector Semantic Search

​How It Works

​Performance

​When to Use

​Vec Query Format

​Limitations

​query - Hybrid Search with Reranking

​How It Works

​Pipeline Stages

​Performance

​When to Use

​Score Interpretation

​Choosing the Right Mode

search

vsearch

query

​Common Options

​Advanced: Structured Queries

​Related

Build docs developers (and LLMs) love

search - BM25 Keyword Search

How It Works

Performance

When to Use

Lex Query Syntax

Limitations

vsearch - Vector Semantic Search

How It Works

Performance

When to Use

Vec Query Format

Limitations

query - Hybrid Search with Reranking

How It Works

Pipeline Stages

Performance

When to Use

Score Interpretation

Choosing the Right Mode

Common Options

Advanced: Structured Queries

Related