Usage
Options
Force re-embedding all documents (clears existing vectors)
How It Works
- Find documents — Identifies unique content hashes needing embeddings
- Chunk documents — Splits large documents into 900-token chunks with 15% overlap
- Generate embeddings — Processes chunks in batches using embedding model
- Store vectors — Saves embeddings to sqlite-vec table for fast similarity search
Chunking Strategy
- Max tokens: 900 per chunk
- Overlap: 15% (135 tokens)
- Boundaries: Prefers markdown headings as split points
- Multi-chunk docs: Large documents split into multiple chunks
Embedding Model
Default:embeddinggemma (google/gemma-2-2b-it-qn_k_m from Hugging Face)
- Dimensions: 2048
- Format: GGUF quantized (qn_k_m)
- Storage:
~/.cache/qmd/models/
Examples
Output
Shows progress with:- Total documents and chunks
- Progress bar with percentage
- Throughput (MB/s)
- ETA
- Error count (if any)
Performance
Typical throughput:- GPU (CUDA/Metal/Vulkan): 500KB-2MB/s
- CPU only: 50-200KB/s
When to Run
Runqmd embed after:
- Adding collections —
qmd collection add - Updating content —
qmd update - First install — Before using vector/hybrid search
Force Re-embedding
Use--force when:
- Switching embedding models
- Fixing corrupted embeddings
- Changing chunking parameters (requires code rebuild)
Deduplication
Embeddings are stored by content hash:- Identical content only embedded once
- Saves time and storage
- Shared chunks across documents
Related Commands
qmd query
Hybrid search (uses embeddings)
qmd vsearch
Vector semantic search
qmd update
Re-index collections
qmd status
Check embedding coverage