Skip to main content
Chroma uses Sentence Transformers as its default embedding function, providing high-quality embeddings out of the box with no additional configuration.

Default model

Chroma uses the all-MiniLM-L6-v2 model by default:
  • Dimensions: 384
  • Speed: Very fast
  • Quality: Excellent for most use cases
  • Size: ~80MB
  • Language: English (trained on English data)
import chromadb

# Uses default Sentence Transformer automatically
client = chromadb.Client()
collection = client.create_collection(name="my_collection")

# No need to specify embedding function
collection.add(
    documents=["This is a document", "This is another document"],
    ids=["doc1", "doc2"]
)

How it works

When you create a collection without specifying an embedding function:
1

Model download

On first use, Chroma downloads the all-MiniLM-L6-v2 model (~80MB) from Hugging Face
2

Local inference

The model runs locally on your machine using ONNX Runtime for optimal performance
3

Automatic embedding

When you add documents, Chroma automatically generates embeddings using the model
4

Caching

The model is cached locally, so subsequent uses are instant

Explicit usage

from chromadb.utils.embedding_functions import DefaultEmbeddingFunction

# Explicitly use the default function
embedding_function = DefaultEmbeddingFunction()

collection = client.create_collection(
    name="my_collection",
    embedding_function=embedding_function
)

ONNX Runtime

Chroma uses ONNX Runtime for fast, efficient inference:
import chromadb
from chromadb.utils.embedding_functions import ONNXMiniLM_L6_V2

# This is what Chroma uses internally
embedding_function = ONNXMiniLM_L6_V2()

collection = client.create_collection(
    name="my_collection",
    embedding_function=embedding_function
)

Benefits of ONNX

Fast inference

Optimized for CPU inference without requiring heavy ML frameworks

Small footprint

Minimal dependencies compared to full PyTorch or TensorFlow

Cross-platform

Works consistently across Windows, macOS, and Linux

No GPU required

Efficient CPU inference, no need for GPU

Model characteristics

Performance

# Embedding speed (approximate on CPU)
# Single document: ~10ms
# Batch of 100: ~200ms
# Batch of 1000: ~1.5s

import time
import chromadb

client = chromadb.Client()
collection = client.create_collection("benchmark")

# Benchmark
start = time.time()
collection.add(
    documents=[f"Document {i}" for i in range(100)],
    ids=[f"id{i}" for i in range(100)]
)
end = time.time()
print(f"Time for 100 documents: {end - start:.2f}s")

Quality metrics

The all-MiniLM-L6-v2 model achieves:
  • Semantic Textual Similarity: 82.41% correlation
  • Semantic Search: 58.04 mean average precision
  • Paraphrase Mining: 70.93 F1 score

Customization

While Sentence Transformers is the default, you can easily switch to other embedding functions:
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

embedding_function = OpenAIEmbeddingFunction(
    api_key="your-key",
    model_name="text-embedding-3-small"
)

Model caching

Cache location

import os

# Default cache location
default_cache = os.path.expanduser("~/.cache/chroma/onnx_models")

# Custom cache location
os.environ["CHROMA_CACHE_DIR"] = "/custom/cache/path"

Pre-downloading

import chromadb
from chromadb.utils.embedding_functions import ONNXMiniLM_L6_V2

# This will download the model if not cached
embedding_function = ONNXMiniLM_L6_V2()

# Now subsequent uses will be instant
client = chromadb.Client()
collection = client.create_collection(
    name="my_collection",
    embedding_function=embedding_function
)

When to use

  • Prototyping: Get started quickly with no API keys
  • English content: Trained on English data
  • Privacy: All processing happens locally
  • Cost-sensitive: No API costs
  • Offline use: Works without internet (after initial download)
  • You need multilingual support → Use Cohere or multilingual Sentence Transformers
  • You need highest quality → Use OpenAI text-embedding-3-large
  • You’re processing very large volumes → Consider hosted embedding APIs
  • You need domain-specific embeddings → Fine-tune or use specialized models

Troubleshooting

# Manually download model
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
# Model is now cached for Chroma to use
The first embedding operation downloads the model (~80MB). Subsequent operations are fast. Consider pre-downloading in your deployment process.
Process documents in smaller batches:
batch_size = 50  # Reduce if needed
for i in range(0, len(documents), batch_size):
    collection.add(
        documents=documents[i:i+batch_size],
        ids=ids[i:i+batch_size]
    )

Resources

The default Sentence Transformer model provides excellent performance for most use cases. You only need to switch to a different embedding function if you have specific requirements like multilingual support or domain-specific embeddings.

Build docs developers (and LLMs) love