Skip to main content
Chroma supports OpenAI’s embedding models out of the box, allowing you to use state-of-the-art embeddings for your vector search applications.

Installation

pip install chromadb openai

Python usage

Basic setup

import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
import os

# Set your API key
os.environ["OPENAI_API_KEY"] = "your-api-key"

# Create embedding function
embedding_function = OpenAIEmbeddingFunction(
    api_key=os.environ["OPENAI_API_KEY"],
    model_name="text-embedding-3-small"
)

# Create client and collection
client = chromadb.Client()
collection = client.create_collection(
    name="openai_embeddings",
    embedding_function=embedding_function
)

# Add documents (embeddings generated automatically)
collection.add(
    documents=["This is a document", "This is another document"],
    ids=["doc1", "doc2"]
)

# Query
results = collection.query(
    query_texts=["document query"],
    n_results=2
)

Available models

Chroma supports all OpenAI embedding models:
# text-embedding-3-small (recommended)
# Dimensions: 1536 (default), configurable down to 512
# Cost-effective for most use cases
embedding_function = OpenAIEmbeddingFunction(
    model_name="text-embedding-3-small"
)

# text-embedding-3-large
# Dimensions: 3072 (default), configurable
# Highest quality embeddings
embedding_function = OpenAIEmbeddingFunction(
    model_name="text-embedding-3-large"
)

Custom dimensions

# Use fewer dimensions for faster search and lower storage
embedding_function = OpenAIEmbeddingFunction(
    api_key=os.environ["OPENAI_API_KEY"],
    model_name="text-embedding-3-small",
    dimensions=512  # Reduce from default 1536
)

API configuration

from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

embedding_function = OpenAIEmbeddingFunction(
    api_key="your-api-key",
    model_name="text-embedding-3-small",
    api_base="https://api.openai.com/v1",  # Custom endpoint
    api_type="openai",  # or "azure" for Azure OpenAI
    api_version="2023-05-15",  # For Azure
    deployment_id="your-deployment"  # For Azure
)

JavaScript usage

import { ChromaClient } from "chromadb";
import { OpenAIEmbeddingFunction } from "chromadb";

const embedder = new OpenAIEmbeddingFunction({
  openai_api_key: "your-api-key",
  openai_model: "text-embedding-3-small"
});

const client = new ChromaClient();
const collection = await client.createCollection({
  name: "openai_collection",
  embeddingFunction: embedder
});

// Add documents
await collection.add({
  ids: ["id1", "id2"],
  documents: ["Document 1", "Document 2"]
});

// Query
const results = await collection.query({
  queryTexts: ["query text"],
  nResults: 2
});

Configuration

api_key
string
required
Your OpenAI API key
model_name
string
default:"text-embedding-3-small"
OpenAI embedding model to use
dimensions
int
Number of dimensions (text-embedding-3 models only)
api_base
string
Custom API endpoint URL
organization_id
string
OpenAI organization ID

Cost optimization

Reduce costs: Use text-embedding-3-small with reduced dimensions (512 or 768) for most applications. This provides excellent quality at a fraction of the cost.
# Cost-effective configuration
embedding_function = OpenAIEmbeddingFunction(
    model_name="text-embedding-3-small",
    dimensions=512  # 67% reduction in storage and search time
)

Azure OpenAI

from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

# Azure OpenAI configuration
embedding_function = OpenAIEmbeddingFunction(
    api_key="your-azure-key",
    api_base="https://your-resource.openai.azure.com/",
    api_type="azure",
    api_version="2023-05-15",
    deployment_id="your-deployment-name",
    model_name="text-embedding-3-small"
)

Error handling

try:
    collection.add(
        documents=["document"],
        ids=["id1"]
    )
except Exception as e:
    if "rate_limit" in str(e).lower():
        # Handle rate limit
        time.sleep(1)
        # Retry
    elif "invalid_api_key" in str(e).lower():
        # Handle authentication error
        print("Invalid API key")

Best practices

Choose the right model

Use text-embedding-3-small for most use cases. Only use text-embedding-3-large if you need the highest quality.

Batch requests

Process documents in batches to avoid rate limits and improve throughput.

Cache embeddings

Store embeddings to avoid re-computing them for the same content.

Monitor costs

Track your API usage and consider using reduced dimensions to lower costs.

Resources

Keep your OpenAI API key secure. Never commit it to version control. Use environment variables or secure key management.

Build docs developers (and LLMs) love