Overview
Chroma provides multiple configuration options to optimize performance for your specific workload. This guide covers index configuration, query optimization, batch operations, and best practices for high-performance deployments.Index Configuration
Chroma supports multiple index types, each with configurable parameters. Proper index configuration is critical for query performance.HNSW Vector Index
Hierarchical Navigable Small World (HNSW) is the default vector index algorithm, providing excellent query performance with configurable accuracy/speed tradeoffs.HNSW Configuration
HNSW Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
ef_construction | int | 100 | Controls index build quality. Higher = better accuracy but slower build |
max_neighbors | int | 16 | Max connections per layer (M). Higher = better accuracy but more memory |
ef_search | int | 10 | Search beam width. Higher = better recall but slower queries |
num_threads | int | 1 | Number of threads for index operations |
batch_size | int | 100 | Batch size for index construction |
sync_threshold | int | 1000 | Number of elements before syncing index to disk |
resize_factor | float | 1.2 | Factor by which to grow index capacity |
HNSW Tuning Guidelines
For high accuracy (recall > 0.95):SPANN Vector Index
SPace Partition and Nearest Neighbor search (SPANN) is designed for billion-scale vector search with disk-based storage.SPANN Configuration
SPANN Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
search_nprobe | int | 10 | Number of clusters to probe during search |
write_nprobe | int | 5 | Number of clusters to probe during writes |
ef_construction | int | 200 | Construction parameter for HNSW sub-indices |
ef_search | int | 100 | Search parameter for HNSW sub-indices |
max_neighbors | int | 16 | Max neighbors in HNSW sub-indices |
split_threshold | int | 10000 | Cluster size triggering a split |
merge_threshold | int | 1000 | Cluster size triggering a merge |
reassign_neighbor_count | int | 100 | Neighbors to consider for reassignment |
When to Use SPANN
Use SPANN when:- Dataset size > 10M vectors
- Memory is limited relative to dataset size
- Disk I/O bandwidth is sufficient
- Acceptable trade-off: slightly lower recall for massive scale
Distance Metrics (Space)
Choose the appropriate distance metric for your embeddings:- Cosine - Best for normalized embeddings (most embedding models)
- L2 - Best for absolute distance measurements
- Inner Product - Best when embeddings have meaningful magnitudes
Full-Text Search Index
Optimize full-text search for document queries:#document field and supports efficient text search with BM25 ranking.
Metadata Inverted Indexes
Chroma automatically creates inverted indexes for metadata fields:where clauses.
Query Optimization
Limiting Results
Always limit results to what you need:Selective Field Inclusion
Only include fields you need:Efficient Filtering
Structurewhere clauses for optimal performance:
Read Level Control
Control consistency vs. performance tradeoff:INDEX_ONLY when:
- Query latency is critical
- Eventual consistency is acceptable
- Workload is read-heavy with infrequent writes
Batch Operations
Batch Inserts
Always batch inserts for better performance:Optimal Batch Size
Balance throughput vs. memory:Batch Queries
Query multiple vectors in a single call:Memory Management
Memory Limits
Set memory limits to prevent OOM:LRU Cache Configuration
Enable LRU caching for segment data:Resource Limits
Increase file descriptor limits for high concurrency:Connection Pooling
Configure HTTP connection pooling for remote clients:Parallel Query Execution
Leverage multiple threads for queries:Server-Side Configuration
Thread Pool Size
Increase for high concurrency:gRPC Timeouts
Adjust for your workload:Performance Best Practices
Ingestion
- Batch inserts - Use batches of 500-5000 depending on embedding size
- Pre-compute embeddings - Generate embeddings before inserting
- Use multiple threads - Parallelize embedding generation
- Tune HNSW construction - Lower
ef_constructionfor faster builds - Increase
sync_threshold- Reduce disk writes during bulk inserts
Querying
- Limit results - Only request what you need
- Use appropriate
ef_search- Balance recall vs. speed - Enable
INDEX_ONLY- For read-heavy workloads - Batch queries - Query multiple vectors at once
- Filter efficiently - Use simple
whereclauses when possible - Cache results - Cache frequent queries at application level
Memory
- Set memory limits - Prevent OOM with
chroma_memory_limit_bytes - Enable LRU cache - For datasets larger than RAM
- Monitor memory usage - Track with observability tools
- Use persistent storage - Don’t rely on in-memory for large datasets
Scaling
- Horizontal scaling - Use distributed Chroma for massive scale
- Read replicas - Separate read and write workloads
- Partition collections - Split large collections by tenant or category
- Monitor query latency - Track p50, p95, p99 percentiles
Benchmarking
Measure your specific workload:Troubleshooting Performance
Slow Queries
- Check
ef_search- may be too high - Verify index is built - check collection count
- Review
whereclause complexity - Monitor memory pressure - may be swapping
- Check network latency - for remote clients
Slow Ingestion
- Increase batch size
- Lower
ef_construction - Increase
num_threads - Increase
sync_threshold - Pre-generate embeddings
High Memory Usage
- Set
chroma_memory_limit_bytes - Enable LRU cache policy
- Reduce batch sizes
- Use persistent storage
- Check for memory leaks in custom embedding functions
Next Steps
- Configure Observability to monitor performance
- Review Configuration options
- Learn about Migrations