Hugging Face embeddings

Chroma supports Hugging Face’s vast ecosystem of embedding models, giving you access to thousands of open-source models for your specific use case.

Installation

Python

pip install chromadb sentence-transformers

The sentence-transformers library is required for using Hugging Face models with Chroma.

Using Sentence Transformers

Basic usage

import chromadb
from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction

# Create embedding function with a specific model
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)

# Create collection
client = chromadb.Client()
collection = client.create_collection(
    name="huggingface_embeddings",
    embedding_function=embedding_function
)

# Add documents
collection.add(
    documents=["This is a document", "This is another document"],
    ids=["doc1", "doc2"]
)

# Query
results = collection.query(
    query_texts=["document query"],
    n_results=2
)

Popular models

General purpose
Multilingual
Specialized

# all-MiniLM-L6-v2 (default in Chroma)
# Dimensions: 384
# Fast and efficient for most use cases
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)

# all-mpnet-base-v2
# Dimensions: 768
# Better quality, slower
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="all-mpnet-base-v2"
)

# paraphrase-multilingual-MiniLM-L12-v2
# Dimensions: 384
# Supports 50+ languages
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="paraphrase-multilingual-MiniLM-L12-v2"
)

# distiluse-base-multilingual-cased-v2
# Dimensions: 512
# Good multilingual performance
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="distiluse-base-multilingual-cased-v2"
)

# msmarco-MiniLM-L-6-v3
# Dimensions: 384
# Optimized for passage retrieval
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="msmarco-MiniLM-L-6-v3"
)

# multi-qa-MiniLM-L6-cos-v1
# Dimensions: 384
# Optimized for question-answering
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="multi-qa-MiniLM-L6-cos-v1"
)

Using the Hugging Face API

With HuggingFace Inference API

from chromadb.utils.embedding_functions import HuggingFaceEmbeddingFunction

# Use Hugging Face Inference API
embedding_function = HuggingFaceEmbeddingFunction(
    api_key="your-hf-api-token",
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

collection = client.create_collection(
    name="hf_api_collection",
    embedding_function=embedding_function
)

With a custom inference endpoint

embedding_function = HuggingFaceEmbeddingFunction(
    api_key="your-api-token",
    model_name="your-model",
    api_url="https://your-endpoint.aws.endpoints.huggingface.cloud"
)

Device configuration

# Run on GPU if available
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2",
    device="cuda"  # or "cpu"
)

# Use specific GPU
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2",
    device="cuda:0"  # Use first GPU
)

Model selection guide

Fast & efficient

all-MiniLM-L6-v2 (384 dims)
Best for: High-throughput applications, real-time search

High quality

all-mpnet-base-v2 (768 dims)
Best for: When accuracy is more important than speed

Multilingual

paraphrase-multilingual-MiniLM-L12-v2 (384 dims)
Best for: Multi-language content

Code search

multi-qa-mpnet-base-dot-v1 (768 dims)
Best for: Question-answering systems

Configuration options

model_name

string

required

Name of the Sentence Transformer model from Hugging Face

device

string

default:"cpu"

Device to run the model on: cpu, cuda, or cuda:N

normalize_embeddings

bool

default:"false"

Whether to normalize embeddings to unit length

api_key

string

Hugging Face API token (for HuggingFaceEmbeddingFunction)

Custom models

Load a fine-tuned model

# Use your own fine-tuned model from Hugging Face
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="your-username/your-model-name"
)

Load a local model

# Use a locally saved model
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="/path/to/local/model"
)

Performance optimization

Batch processing

# Add documents in batches for better performance
documents = [f"Document {i}" for i in range(1000)]
ids = [f"id{i}" for i in range(1000)]

# Process in batches of 100
batch_size = 100
for i in range(0, len(documents), batch_size):
    batch_docs = documents[i:i+batch_size]
    batch_ids = ids[i:i+batch_size]
    collection.add(
        documents=batch_docs,
        ids=batch_ids
    )

Model caching

import os

# Set cache directory for models
os.environ["SENTENCE_TRANSFORMERS_HOME"] = "/path/to/cache"

# First use downloads the model to cache
embedding_function = SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)

Finding models

Browse thousands of models on Hugging Face:

Visit Hugging Face Models
Filter by:
- Task: “Sentence Similarity” or “Feature Extraction”
- Library: “sentence-transformers”
Check model card for:
- Embedding dimensions
- Languages supported
- Performance metrics
- License

Best practices

Choose the right model size

Smaller models (384 dims): Faster, use less storage
Larger models (768+ dims): Better quality, slower
Consider your performance requirements vs. accuracy needs

Use GPU for large datasets

If you’re processing thousands of documents, using a GPU can significantly speed up embedding generation.

Cache downloaded models

Models are cached after first download. Set SENTENCE_TRANSFORMERS_HOME to control cache location.

Test different models

Different models excel at different tasks. Test multiple models with your specific use case.

Resources

Free and open-source: All Sentence Transformer models can be run locally without API costs, making them perfect for production deployments.

Frameworks

Embedding Providers

Hugging Face embeddings

Installation

Using Sentence Transformers

Basic usage

Popular models

Using the Hugging Face API

With HuggingFace Inference API

With a custom inference endpoint

Device configuration

Model selection guide

Fast & efficient

High quality

Multilingual

Code search

Configuration options

Custom models

Load a fine-tuned model

Load a local model

Performance optimization

Batch processing

Model caching

Finding models

Best practices

Resources

Build docs developers (and LLMs) love

Frameworks

Embedding Providers

Documentation Index

​Installation

​Using Sentence Transformers

​Basic usage

​Popular models

​Using the Hugging Face API

​With HuggingFace Inference API

​With a custom inference endpoint

​Device configuration

​Model selection guide

Fast & efficient

High quality

Multilingual

Code search

​Configuration options

​Custom models

​Load a fine-tuned model

​Load a local model

​Performance optimization

​Batch processing

​Model caching

​Finding models

​Best practices

​Resources

Build docs developers (and LLMs) love

Installation

Using Sentence Transformers

Basic usage

Popular models

Using the Hugging Face API

With HuggingFace Inference API

With a custom inference endpoint

Device configuration

Model selection guide

Configuration options

Custom models

Load a fine-tuned model

Load a local model

Performance optimization

Batch processing

Model caching

Finding models

Best practices

Resources