Skip to main content
The Collection API provides methods for working with documents and embeddings in a collection.

Document Operations

add()

Add records to the collection.
collection.add(
    ids=["id1", "id2", "id3"],
    documents=["doc1", "doc2", "doc3"],
    metadatas=[{"type": "article"}, {"type": "blog"}, {"type": "paper"}]
)
ids
str | List[str]
required
Record IDs to add. Must be unique within the collection.
embeddings
List[float] | List[List[float]]
Embeddings to add. If None, embeddings are computed from documents/images using the collection’s embedding function.
metadatas
Dict | List[Dict]
Optional metadata for each record.
documents
str | List[str]
Optional documents for each record.
images
ndarray | List[ndarray]
Optional images for each record.
uris
str | List[str]
Optional URIs for loading images.
raises
ValueError
  • If both embeddings and documents/images are missing
  • If lengths of provided fields do not match
  • If an ID already exists
Either provide embeddings OR provide documents/images to be embedded. You cannot provide both.

get()

Retrieve records from the collection.
results = collection.get(
    ids=["id1", "id2"],
    include=["documents", "metadatas"]
)
ids
List[str]
If provided, only return records with these IDs.
where
Where
A Where filter used to filter based on metadata values.
limit
int
Maximum number of results to return.
offset
int
Number of results to skip before returning.
where_document
WhereDocument
A WhereDocument filter used to filter based on document content.
include
List[str]
default:"[\"metadatas\", \"documents\"]"
Fields to include in results. Can contain "embeddings", "metadatas", "documents", "uris".
return
GetResult
Retrieved records and requested fields as a GetResult object with:
  • ids: List of IDs
  • documents: Optional list of documents
  • metadatas: Optional list of metadata
  • embeddings: Optional list of embeddings
  • uris: Optional list of URIs

query()

Query for the K nearest neighbor records.
results = collection.query(
    query_embeddings=[[0.1, 0.2, 0.3]],
    n_results=10
)
query_embeddings
List[float] | List[List[float]]
Raw embeddings to query for.
query_texts
str | List[str]
Documents to embed and query against using the collection’s embedding function.
query_images
ndarray | List[ndarray]
Images to embed and query against.
query_uris
str | List[str]
URIs to be loaded and embedded.
ids
List[str]
Optional subset of IDs to search within.
n_results
int
default:"10"
Number of neighbors to return per query.
where
Where
Metadata filter.
where_document
WhereDocument
Document content filter.
include
List[str]
default:"[\"metadatas\", \"documents\", \"distances\"]"
Fields to include in results. Can contain "embeddings", "metadatas", "documents", "uris", "distances".
return
QueryResult
Nearest neighbor results in batched format:
  • ids: List of lists of IDs (one list per query)
  • documents: Optional list of lists of documents
  • metadatas: Optional list of lists of metadata
  • embeddings: Optional list of lists of embeddings
  • distances: Optional list of lists of distances
raises
ValueError
  • If no query input is provided
  • If multiple query input types are provided
This is a batch query API. Multiple queries can be performed at once by providing lists of query inputs.

Perform hybrid search on the collection using the new Search API.
from chromadb.execution.expression import Search, K, Knn

results = collection.search(
    Search()
        .where(K("category") == "science")
        .rank(Knn(query=[0.1, 0.2, 0.3]))
        .limit(10)
        .select(K.DOCUMENT, K.SCORE)
)
searches
Search | List[Search]
required
A single Search object or list of Search objects, each containing:
  • where: Where expression for filtering
  • rank: Ranking expression for hybrid search (defaults to Val(0.0))
  • limit: Limit configuration for pagination
  • select: Select configuration for keys to return
read_level
ReadLevel
default:"ReadLevel.INDEX_AND_WAL"
Controls whether to read from the write-ahead log (WAL):
  • ReadLevel.INDEX_AND_WAL: Read from both index and WAL (default, all writes visible)
  • ReadLevel.INDEX_ONLY: Read only from index, skipping WAL (faster, may miss recent writes)
return
SearchResult
Column-major format response with:
  • ids: List of result IDs for each search payload
  • documents: Optional documents for each payload
  • embeddings: Optional embeddings for each payload
  • metadatas: Optional metadata for each payload
  • scores: Optional scores for each payload
  • select: List of selected keys for each payload
Use .rows() method to convert to row-major format.
This is an experimental API for distributed and hosted Chroma. See Search API for more details.

update()

Update existing records by ID.
collection.update(
    ids=["id1", "id2"],
    documents=["updated doc1", "updated doc2"]
)
ids
str | List[str]
required
Record IDs to update.
embeddings
List[float] | List[List[float]]
Updated embeddings. If None, embeddings are computed from documents/images.
metadatas
Dict | List[Dict]
Updated metadata.
documents
str | List[str]
Updated documents.
images
ndarray | List[ndarray]
Updated images.
uris
str | List[str]
Updated URIs for loading images.

upsert()

Create or update records by ID.
collection.upsert(
    ids=["id1", "id2", "id3"],
    documents=["doc1", "doc2", "doc3"],
    metadatas=[{"new": True}, {"new": True}, {"new": False}]
)
ids
str | List[str]
required
Record IDs to upsert.
embeddings
List[float] | List[List[float]]
Embeddings to add or update. If None, embeddings are computed.
metadatas
Dict | List[Dict]
Metadata to add or update.
documents
str | List[str]
Documents to add or update.
images
ndarray | List[ndarray]
Images to add or update.
uris
str | List[str]
URIs for loading images.

delete()

Delete records by ID or filters.
collection.delete(ids=["id1", "id2"])
ids
List[str]
Record IDs to delete.
where
Where
Metadata filter.
where_document
WhereDocument
Document content filter.
raises
ValueError
If no IDs or filters are provided.
All documents that match the filters will be permanently deleted.

Collection Utilities

count()

Return the number of records in the collection.
num_records = collection.count()
print(f"Collection has {num_records} records")
return
int
The number of records in the collection.

peek()

Return the first N records from the collection.
results = collection.peek(limit=5)
limit
int
default:"10"
Maximum number of records to return.
return
GetResult
Retrieved records with IDs, documents, metadatas, and embeddings.

get_indexing_status()

Get the indexing status of the collection.
status = collection.get_indexing_status()
print(f"Indexed: {status.num_indexed_ops}/{status.total_ops}")
print(f"Progress: {status.op_indexing_progress:.2%}")
return
IndexingStatus
An object containing:
  • num_indexed_ops: Number of user operations that have been indexed
  • num_unindexed_ops: Number of user operations pending indexing
  • total_ops: Total number of user operations in collection
  • op_indexing_progress: Proportion of operations indexed (0.0 to 1.0)

modify()

Update collection name, metadata, or configuration.
collection.modify(name="new_collection_name")
name
str
New collection name.
metadata
CollectionMetadata
New metadata for the collection.
configuration
UpdateCollectionConfiguration
New configuration for the collection.

fork()

Fork the current collection under a new name with identical data.
This is an experimental API that only works for Hosted Chroma.
new_collection = collection.fork(new_name="my_collection_backup")
Parameters:
new_name
str
required
The name of the new collection.
Returns:
return
Collection
A new collection with the specified name containing identical data to the current collection.

attach_function()

Attach a function to this collection for automated processing.
from chromadb.api.functions import Function

attached_fn, created = collection.attach_function(
    function=Function.STATISTICS_FUNCTION,
    name="mycoll_stats_fn",
    output_collection="mycoll_stats"
)

if created:
    print("New function attached")
else:
    print("Function already existed")
Parameters:
function
Function
required
A Function enum value (e.g., STATISTICS_FUNCTION, RECORD_COUNTER_FUNCTION)
name
str
required
Unique name for this attached function
output_collection
str
required
Name of the collection where function output will be stored
params
Dict[str, Any]
Optional dictionary with function-specific parameters
Returns:
return
Tuple[AttachedFunction, bool]
Tuple of (AttachedFunction, created) where created is True if newly created, False if already existed

get_attached_function()

Get an attached function by name for this collection.
attached_fn = collection.get_attached_function(name="mycoll_stats_fn")
Parameters:
name
str
required
Name of the attached function
Returns:
return
AttachedFunction
The attached function object
Raises:
  • NotFoundError: If the attached function doesn’t exist

detach_function()

Detach a function from this collection.
# Detach function but keep output collection
success = collection.detach_function(name="mycoll_stats_fn")

# Detach function and delete output collection
success = collection.detach_function(
    name="mycoll_stats_fn",
    delete_output_collection=True
)
Parameters:
name
str
required
The name of the attached function
delete_output_collection
bool
default:"False"
Whether to also delete the output collection
Returns:
return
bool
True if successful

Collection Properties

name

Get the collection name.
print(collection.name)

id

Get the collection UUID.
print(collection.id)

metadata

Get the collection metadata.
print(collection.metadata)

Build docs developers (and LLMs) love