Skip to main content
Chroma supports Hugging Face’s vast ecosystem of embedding models, giving you access to thousands of open-source models for your specific use case.

Installation

Python
pip install chromadb sentence-transformers
The sentence-transformers library is required for using Hugging Face models with Chroma.

Using Sentence Transformers

Basic usage

import chromadb
from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction

# Create embedding function with a specific model
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)

# Create collection
client = chromadb.Client()
collection = client.create_collection(
    name="huggingface_embeddings",
    embedding_function=embedding_function
)

# Add documents
collection.add(
    documents=["This is a document", "This is another document"],
    ids=["doc1", "doc2"]
)

# Query
results = collection.query(
    query_texts=["document query"],
    n_results=2
)
# all-MiniLM-L6-v2 (default in Chroma)
# Dimensions: 384
# Fast and efficient for most use cases
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)

# all-mpnet-base-v2
# Dimensions: 768
# Better quality, slower
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="all-mpnet-base-v2"
)

Using the Hugging Face API

With HuggingFace Inference API

from chromadb.utils.embedding_functions import HuggingFaceEmbeddingFunction

# Use Hugging Face Inference API
embedding_function = HuggingFaceEmbeddingFunction(
    api_key="your-hf-api-token",
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

collection = client.create_collection(
    name="hf_api_collection",
    embedding_function=embedding_function
)

With a custom inference endpoint

embedding_function = HuggingFaceEmbeddingFunction(
    api_key="your-api-token",
    model_name="your-model",
    api_url="https://your-endpoint.aws.endpoints.huggingface.cloud"
)

Device configuration

# Run on GPU if available
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2",
    device="cuda"  # or "cpu"
)

# Use specific GPU
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2",
    device="cuda:0"  # Use first GPU
)

Model selection guide

Fast & efficient

all-MiniLM-L6-v2 (384 dims)
Best for: High-throughput applications, real-time search

High quality

all-mpnet-base-v2 (768 dims)
Best for: When accuracy is more important than speed

Multilingual

paraphrase-multilingual-MiniLM-L12-v2 (384 dims)
Best for: Multi-language content

Code search

multi-qa-mpnet-base-dot-v1 (768 dims)
Best for: Question-answering systems

Configuration options

model_name
string
required
Name of the Sentence Transformer model from Hugging Face
device
string
default:"cpu"
Device to run the model on: cpu, cuda, or cuda:N
normalize_embeddings
bool
default:"false"
Whether to normalize embeddings to unit length
api_key
string
Hugging Face API token (for HuggingFaceEmbeddingFunction)

Custom models

Load a fine-tuned model

# Use your own fine-tuned model from Hugging Face
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="your-username/your-model-name"
)

Load a local model

# Use a locally saved model
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="/path/to/local/model"
)

Performance optimization

Batch processing

# Add documents in batches for better performance
documents = [f"Document {i}" for i in range(1000)]
ids = [f"id{i}" for i in range(1000)]

# Process in batches of 100
batch_size = 100
for i in range(0, len(documents), batch_size):
    batch_docs = documents[i:i+batch_size]
    batch_ids = ids[i:i+batch_size]
    collection.add(
        documents=batch_docs,
        ids=batch_ids
    )

Model caching

import os

# Set cache directory for models
os.environ["SENTENCE_TRANSFORMERS_HOME"] = "/path/to/cache"

# First use downloads the model to cache
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)

Finding models

Browse thousands of models on Hugging Face:
  1. Visit Hugging Face Models
  2. Filter by:
    • Task: “Sentence Similarity” or “Feature Extraction”
    • Library: “sentence-transformers”
  3. Check model card for:
    • Embedding dimensions
    • Languages supported
    • Performance metrics
    • License

Best practices

  • Smaller models (384 dims): Faster, use less storage
  • Larger models (768+ dims): Better quality, slower
  • Consider your performance requirements vs. accuracy needs
If you’re processing thousands of documents, using a GPU can significantly speed up embedding generation.
Models are cached after first download. Set SENTENCE_TRANSFORMERS_HOME to control cache location.
Different models excel at different tasks. Test multiple models with your specific use case.

Resources

Free and open-source: All Sentence Transformer models can be run locally without API costs, making them perfect for production deployments.

Build docs developers (and LLMs) love