OpenAI embeddings

Chroma supports OpenAI’s embedding models out of the box, allowing you to use state-of-the-art embeddings for your vector search applications.

Installation

pip install chromadb openai

Python usage

Basic setup

import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
import os

# Set your API key
os.environ["OPENAI_API_KEY"] = "your-api-key"

# Create embedding function
embedding_function = OpenAIEmbeddingFunction(
    api_key=os.environ["OPENAI_API_KEY"],
    model_name="text-embedding-3-small"
)

# Create client and collection
client = chromadb.Client()
collection = client.create_collection(
    name="openai_embeddings",
    embedding_function=embedding_function
)

# Add documents (embeddings generated automatically)
collection.add(
    documents=["This is a document", "This is another document"],
    ids=["doc1", "doc2"]
)

# Query
results = collection.query(
    query_texts=["document query"],
    n_results=2
)

Available models

Chroma supports all OpenAI embedding models:

Latest models
Legacy models

# text-embedding-3-small (recommended)
# Dimensions: 1536 (default), configurable down to 512
# Cost-effective for most use cases
embedding_function = OpenAIEmbeddingFunction(
    model_name="text-embedding-3-small"
)

# text-embedding-3-large
# Dimensions: 3072 (default), configurable
# Highest quality embeddings
embedding_function = OpenAIEmbeddingFunction(
    model_name="text-embedding-3-large"
)

# text-embedding-ada-002
# Dimensions: 1536 (fixed)
# Previous generation model
embedding_function = OpenAIEmbeddingFunction(
    model_name="text-embedding-ada-002"
)

Custom dimensions

# Use fewer dimensions for faster search and lower storage
embedding_function = OpenAIEmbeddingFunction(
    api_key=os.environ["OPENAI_API_KEY"],
    model_name="text-embedding-3-small",
    dimensions=512  # Reduce from default 1536
)

API configuration

from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

embedding_function = OpenAIEmbeddingFunction(
    api_key="your-api-key",
    model_name="text-embedding-3-small",
    api_base="https://api.openai.com/v1",  # Custom endpoint
    api_type="openai",  # or "azure" for Azure OpenAI
    api_version="2023-05-15",  # For Azure
    deployment_id="your-deployment"  # For Azure
)

JavaScript usage

import { ChromaClient } from "chromadb";
import { OpenAIEmbeddingFunction } from "chromadb";

const embedder = new OpenAIEmbeddingFunction({
  openai_api_key: "your-api-key",
  openai_model: "text-embedding-3-small"
});

const client = new ChromaClient();
const collection = await client.createCollection({
  name: "openai_collection",
  embeddingFunction: embedder
});

// Add documents
await collection.add({
  ids: ["id1", "id2"],
  documents: ["Document 1", "Document 2"]
});

// Query
const results = await collection.query({
  queryTexts: ["query text"],
  nResults: 2
});

Configuration

api_key

string

required

Your OpenAI API key

model_name

string

default:"text-embedding-3-small"

OpenAI embedding model to use

dimensions

int

Number of dimensions (text-embedding-3 models only)

api_base

string

Custom API endpoint URL

organization_id

string

OpenAI organization ID

Cost optimization

Reduce costs: Use text-embedding-3-small with reduced dimensions (512 or 768) for most applications. This provides excellent quality at a fraction of the cost.

# Cost-effective configuration
embedding_function = OpenAIEmbeddingFunction(
    model_name="text-embedding-3-small",
    dimensions=512  # 67% reduction in storage and search time
)

Azure OpenAI

from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

# Azure OpenAI configuration
embedding_function = OpenAIEmbeddingFunction(
    api_key="your-azure-key",
    api_base="https://your-resource.openai.azure.com/",
    api_type="azure",
    api_version="2023-05-15",
    deployment_id="your-deployment-name",
    model_name="text-embedding-3-small"
)

Error handling

try:
    collection.add(
        documents=["document"],
        ids=["id1"]
    )
except Exception as e:
    if "rate_limit" in str(e).lower():
        # Handle rate limit
        time.sleep(1)
        # Retry
    elif "invalid_api_key" in str(e).lower():
        # Handle authentication error
        print("Invalid API key")

Best practices

Choose the right model

Use text-embedding-3-small for most use cases. Only use text-embedding-3-large if you need the highest quality.

Batch requests

Process documents in batches to avoid rate limits and improve throughput.

Cache embeddings

Store embeddings to avoid re-computing them for the same content.

Monitor costs

Track your API usage and consider using reduced dimensions to lower costs.

Resources

Keep your OpenAI API key secure. Never commit it to version control. Use environment variables or secure key management.

Frameworks

Embedding Providers

OpenAI embeddings

Installation

Python usage

Basic setup

Available models

Custom dimensions

API configuration

JavaScript usage

Configuration

Cost optimization

Azure OpenAI

Error handling

Best practices

Choose the right model

Batch requests

Cache embeddings

Monitor costs

Resources

Build docs developers (and LLMs) love

Frameworks

Embedding Providers

Documentation Index

​Installation

​Python usage

​Basic setup

​Available models

​Custom dimensions

​API configuration

​JavaScript usage

​Configuration

​Cost optimization

​Azure OpenAI

​Error handling

​Best practices

Choose the right model

Batch requests

Cache embeddings

Monitor costs

​Resources

Build docs developers (and LLMs) love

Installation

Python usage

Basic setup

Available models

Custom dimensions

API configuration

JavaScript usage

Configuration

Cost optimization

Azure OpenAI

Error handling

Best practices

Resources