Skip to main content
QMD provides three search modes, each with different tradeoffs between speed, quality, and resource usage:
ModeSpeedQualityUse Case
searchFast (~10ms)GoodKeyword matching, exact terms
vsearchMedium (~100ms)BetterSemantic similarity, questions
querySlow (~1-2s)BestHigh-quality results, LLM integration
Fast full-text search using SQLite FTS5 with BM25 ranking. No LLM or embedding lookup required.
qmd search "authentication flow"

How It Works

  1. Parses query into FTS5 syntax
  2. Searches documents_fts table (full-text index)
  3. Ranks results using BM25 algorithm
  4. Returns top N results (default: 5)

Performance

  • Speed: ~10-50ms
  • No GPU required
  • No embeddings required

When to Use

When you know specific terms that appear in documents:
qmd search "OAuth 2.0"
qmd search "CAP theorem"
Use quotes for exact phrases:
qmd search '"machine learning"'
qmd search '"error handling best practices"'
Exclude terms with -:
qmd search "auth -oauth -saml"
qmd search '"machine learning" -"deep learning"'

Lex Query Syntax

The search command uses lex (lexical) query syntax:
SyntaxMeaningExample
wordPrefix matchperf matches “performance”
"phrase"Exact phrase"rate limiter"
-wordExclude term-sports
-"phrase"Exclude phrase-"test data"
qmd search "auth"  # Matches: auth, authentication, authorize

Limitations

  • No semantic understanding (“auth” won’t match “login”)
  • Requires knowing exact terminology
  • Prefix matching can return false positives
Semantic similarity search using embeddings and cosine distance. Better at understanding intent than keyword search.
qmd vsearch "how does authentication work"

How It Works

  1. Embeds query using embeddinggemma-300M model
  2. Searches vectors_vec table (vector index)
  3. Computes cosine distance to all document chunks
  4. Returns top N results (default: 5)

Performance

  • Speed: ~50-200ms (depends on index size and GPU)
  • Requires: Embeddings (qmd embed)
  • GPU: Accelerates embedding generation

When to Use

When searching with questions or descriptions:
qmd vsearch "how to handle rate limiting"
qmd vsearch "what is the difference between consistency and availability"
When you don’t know exact terms but understand the concept:
qmd vsearch "authentication"  # Finds: login, OAuth, SSO, credentials
qmd vsearch "distributed systems"  # Finds: CAP theorem, consensus, replication
Embeddings can match concepts across different phrasings:
qmd vsearch "user login"  # Matches: authentication, sign-in, credentials

Vec Query Format

Vector queries are plain natural language — no special syntax:
qmd vsearch "how does the rate limiter handle burst traffic"
qmd vsearch "what is the tradeoff between consistency and availability"
Vector queries do not support lex syntax like -term or "exact phrase". Use search for exclusions.

Limitations

  • Requires running qmd embed first
  • No query expansion (single embedding only)
  • No LLM reranking (may miss nuance)
  • Results not optimized for best-first ordering

query - Hybrid Search with Reranking

Highest quality search combining BM25, vector search, query expansion, and LLM reranking. Recommended for most use cases.
qmd query "user authentication process"

How It Works

Query ──► LLM Expansion ──► [Original, Variant 1]

    ┌─────────┴─────────┐
    ▼                   ▼
 BM25 Search      Vector Search
    │                   │
    └─────────┬─────────┘

    RRF Fusion (k=60)
    Original query ×2 weight
    Top-rank bonus: +0.05


    Top 30 candidates


    LLM Re-ranking
    (qwen3-reranker)


  Position-Aware Blend
  Rank 1-3:  75% RRF / 25% reranker
  Rank 4-10: 60% RRF / 40% reranker
  Rank 11+:  40% RRF / 60% reranker


    Final Results

Pipeline Stages

1

Query Expansion

LLM generates alternative query phrasings using qmd-query-expansion-1.7B model.Original query gets 2× weight in fusion to preserve exact matches.
2

Parallel Retrieval

Each query variation searches:
  • FTS (BM25): Keyword matching
  • Vector: Semantic similarity
3

RRF Fusion

Combines all results using Reciprocal Rank Fusion:
score = Σ(1 / (k + rank + 1))  where k=60
Top-ranked documents get bonus:
  • Rank #1: +0.05
  • Rank #2-3: +0.02
4

Candidate Selection

Top 30 candidates proceed to reranking (balances quality vs latency).
5

LLM Reranking

qwen3-reranker-0.6b scores each document for relevance (0.0-1.0).
6

Position-Aware Blending

Blends retrieval and reranker scores based on position:
RankRetrieval WeightReranker Weight
1-375%25%
4-1060%40%
11+40%60%
This prevents the reranker from demoting strong exact matches.

Performance

  • Speed: ~500ms-2s (depends on GPU and query complexity)
  • Requires: Embeddings (qmd embed) and GPU for best performance
  • Models: 3 GGUF models (~2GB total)

When to Use

When you need the most relevant results and can afford the latency:
qmd query "how does the authentication system work"
When feeding results to an LLM (via MCP or CLI):
qmd query "API design patterns" --json
When queries need understanding and expansion:
qmd query "what are the tradeoffs between consistency and availability"
When missing relevant documents is worse than higher latency.

Score Interpretation

ScoreMeaning
0.8 - 1.0Highly relevant
0.5 - 0.8Moderately relevant
0.2 - 0.5Somewhat relevant
0.0 - 0.2Low relevance
Use --min-score to filter results:
qmd query "error handling" --all --min-score 0.4

Choosing the Right Mode

search

Use when:
  • You know exact keywords
  • Speed is critical (<50ms)
  • No embeddings available
Example: qmd search "OAuth 2.0"

vsearch

Use when:
  • Natural language queries
  • Semantic understanding needed
  • Fast results (<200ms)
Example: qmd vsearch "how to authenticate"

query

Use when:
  • Best quality required
  • LLM integration
  • Complex questions
Example: qmd query "authentication best practices"

Common Options

All search modes support these options:
-n <num>           # Number of results (default: 5, or 20 for --files/--json)
-c, --collection   # Filter to specific collection
--all              # Return all matches (use with --min-score)
--min-score <num>  # Minimum score threshold (default: 0)
--full             # Show full document content
--line-numbers     # Add line numbers to output
--json             # JSON output
--files            # File list format (docid,score,filepath,context)
qmd search "API" -c docs
qmd vsearch "authentication" --collection notes
qmd query "error handling" -c docs

Advanced: Structured Queries

For maximum control, use Query Syntax to specify multiple query types:
qmd query $'lex: rate limiter algorithm\nvec: how does rate limiting work\nhyde: The API uses a token bucket algorithm...'
See Query Syntax for details.

Build docs developers (and LLMs) love