Chroma integrates with Cohere ’s powerful embedding models, offering excellent multilingual support and semantic understanding.
Installation
pip install chromadb cohere
Python usage
Basic setup
import chromadb
from chromadb.utils.embedding_functions import CohereEmbeddingFunction
import os
# Set your API key
os.environ[ "COHERE_API_KEY" ] = "your-api-key"
# Create embedding function
embedding_function = CohereEmbeddingFunction(
api_key = os.environ[ "COHERE_API_KEY" ],
model_name = "embed-english-v3.0"
)
# Create client and collection
client = chromadb.Client()
collection = client.create_collection(
name = "cohere_embeddings" ,
embedding_function = embedding_function
)
# Add documents
collection.add(
documents = [ "This is a document" , "Another document" ],
ids = [ "doc1" , "doc2" ]
)
# Query
results = collection.query(
query_texts = [ "search query" ],
n_results = 2
)
Available models
English models
Multilingual models
# embed-english-v3.0 (recommended)
# Dimensions: 1024
# Best for English content
embedding_function = CohereEmbeddingFunction(
model_name = "embed-english-v3.0"
)
# embed-english-light-v3.0
# Dimensions: 384
# Faster and more efficient
embedding_function = CohereEmbeddingFunction(
model_name = "embed-english-light-v3.0"
)
# embed-multilingual-v3.0 (recommended)
# Dimensions: 1024
# Supports 100+ languages
embedding_function = CohereEmbeddingFunction(
model_name = "embed-multilingual-v3.0"
)
# embed-multilingual-light-v3.0
# Dimensions: 384
# Faster multilingual embeddings
embedding_function = CohereEmbeddingFunction(
model_name = "embed-multilingual-light-v3.0"
)
Cohere models support different input types for optimized embeddings:
# For search documents
embedding_function = CohereEmbeddingFunction(
api_key = os.environ[ "COHERE_API_KEY" ],
model_name = "embed-english-v3.0" ,
input_type = "search_document" # Optimize for document indexing
)
collection = client.create_collection(
name = "documents" ,
embedding_function = embedding_function
)
# Add documents with optimized embeddings
collection.add(
documents = [ "Document content" ],
ids = [ "doc1" ]
)
# For search queries, use search_query input type
query_embedding_function = CohereEmbeddingFunction(
api_key = os.environ[ "COHERE_API_KEY" ],
model_name = "embed-english-v3.0" ,
input_type = "search_query" # Optimize for queries
)
Multilingual search
# Create multilingual collection
embedding_function = CohereEmbeddingFunction(
model_name = "embed-multilingual-v3.0"
)
collection = client.create_collection(
name = "multilingual_docs" ,
embedding_function = embedding_function
)
# Add documents in different languages
collection.add(
documents = [
"Hello world" , # English
"Hola mundo" , # Spanish
"Bonjour le monde" , # French
"こんにちは世界" # Japanese
],
ids = [ "en" , "es" , "fr" , "ja" ]
)
# Query in any language
results = collection.query(
query_texts = [ "greetings" ],
n_results = 4
)
JavaScript usage
import { ChromaClient } from "chromadb" ;
import { CohereEmbeddingFunction } from "chromadb" ;
const embedder = new CohereEmbeddingFunction ({
cohere_api_key: "your-api-key" ,
model: "embed-english-v3.0"
});
const client = new ChromaClient ();
const collection = await client . createCollection ({
name: "cohere_collection" ,
embeddingFunction: embedder
});
// Add and query documents
await collection . add ({
ids: [ "id1" , "id2" ],
documents: [ "Document 1" , "Document 2" ]
});
const results = await collection . query ({
queryTexts: [ "search query" ],
nResults: 2
});
Configuration
model_name
string
default: "embed-english-v3.0"
Cohere embedding model to use
Input type: search_document, search_query, classification, or clustering
Truncation strategy: NONE, START, or END
Use when indexing documents into your search system. Optimizes embeddings for being searched against.
Use when embedding search queries. Optimizes embeddings for finding relevant documents.
Use when using embeddings for classification tasks.
Use when performing clustering analysis on embeddings.
Supported languages
Cohere’s multilingual models support over 100 languages including:
European : English, Spanish, French, German, Italian, Portuguese, Russian, Polish, Dutch, Czech, etc.
Asian : Chinese, Japanese, Korean, Hindi, Thai, Vietnamese, Indonesian, etc.
Middle Eastern : Arabic, Hebrew, Turkish, Farsi, etc.
Other : See Cohere’s language support documentation for the full list
Best practices
Use the right model Use multilingual models only if you need multi-language support. English models are faster for English-only content.
Specify input types Always specify input_type for better embedding quality. Use search_document for indexing and search_query for queries.
Batch efficiently Cohere supports up to 96 texts per request. Batch your requests for better performance.
Monitor usage Track your API usage through the Cohere dashboard.
Error handling
import time
from cohere.errors import RateLimitError, AuthenticationError
try :
collection.add(
documents = [ "document" ],
ids = [ "id1" ]
)
except RateLimitError:
# Handle rate limit
time.sleep( 60 ) # Wait before retrying
except AuthenticationError:
print ( "Invalid API key" )
Resources
Free tier : Cohere offers a generous free tier for experimentation. Perfect for getting started with multilingual search.