Overview
Indexing is implemented across multiple crates:indexing- Core index abstraction and B-tree indexessearch- Full-text and vector search indexestext_search- Text search specificsvector- Vector operations and types
Index types
Database indexes (B-tree)
Standard ordered indexes:- Ordered by index key(s)
- Support range queries
- Efficient point lookups
- Maintained automatically
Text search indexes
Full-text search powered by Tantivy:- Tokenization and stemming
- BM25 scoring
- Fuzzy matching
- Phrase queries
- Field boosting
Vector indexes
Similarity search using Qdrant:- Cosine similarity
- Euclidean distance
- Dot product
Core indexing crate
Index registry
Path:crates/indexing/
Manages index metadata:
Index structure
B-tree implementation:Range queries
Efficient range scans:Search crate architecture
Overview
Path:crates/search/
Integrates multiple search engines:
- Tantivy for text search
- Qdrant segment library for vector search
- Unified search interface
Text search implementation
Index building
Query execution
Vector search implementation
Index structure
Vector operations
Index maintenance
Automatic updates
Indexes are updated automatically:- On write: Document insert/update/delete triggers index update
- Transactional: Index updates are part of transaction
- Consistent: Indexes always reflect committed state
- Asynchronous: Search indexes update in background
Backfilling
When a new index is created:- In the background without blocking
- With progress tracking
- Resumable on failure
- Index becomes queryable when complete
Index workers
Background workers maintain indexes:Query optimization
Index selection
Query planner chooses best index:Covering indexes
When index contains all needed fields:Query pushdown
Filters are pushed to index layer:Performance characteristics
B-tree indexes
- Lookup: O(log n) average case
- Range scan: O(log n + k) where k is result size
- Insert/update: O(log n)
- Space: O(n * key_size)
Text search
- Indexing: O(n * avg_document_length)
- Query: Sub-linear with inverted index
- Space: ~2-3x document size
- Relevance: BM25 scoring
Vector search
- Indexing: O(n log n) with HNSW
- Query: O(log n) approximate
- Space: O(n * dimensions)
- Accuracy: Configurable precision/recall tradeoff
Index storage
Persistence
Indexes are stored differently:- B-tree indexes: In main database alongside documents
- Text indexes: Separate Tantivy directory
- Vector indexes: Qdrant segment files
Storage layout
Compaction
Search indexes are periodically compacted:- Merge segments in Tantivy
- Optimize HNSW graph in vector indexes
- Remove deleted documents
- Reclaim space
Monitoring and debugging
Index statistics
Per-index metrics:Query explain
Explain query execution:Slow query logging
Queries not using indexes are logged:Best practices
Index design
- Index common queries: Create indexes for frequent access patterns
- Compound indexes: Use multi-field indexes for complex queries
- Covering indexes: Include all fields needed by query
- Avoid over-indexing: Each index has storage and maintenance cost
Search index tuning
Text search optimization:- Choose appropriate tokenizer
- Configure stemming for language
- Tune BM25 parameters for domain
- Use filters to narrow results
- Choose right distance metric
- Tune vector dimensions
- Balance accuracy vs performance
- Use metadata filtering
Query patterns
Efficient queries:Testing
Index correctness tests
Performance benchmarks
Next steps
- Database engine component - Query execution
- Data persistence layer - Storage backend
- Rust backend architecture - Overall system