Collection API

The Collection API provides methods for working with documents and embeddings in a collection.

Document Operations

add()

Add records to the collection.

collection.add(
    ids=["id1", "id2", "id3"],
    documents=["doc1", "doc2", "doc3"],
    metadatas=[{"type": "article"}, {"type": "blog"}, {"type": "paper"}]
)

ids

str | List[str]

required

Record IDs to add. Must be unique within the collection.

embeddings

List[float] | List[List[float]]

Embeddings to add. If None, embeddings are computed from documents/images using the collection’s embedding function.

metadatas

Dict | List[Dict]

Optional metadata for each record.

documents

str | List[str]

Optional documents for each record.

images

ndarray | List[ndarray]

Optional images for each record.

uris

str | List[str]

Optional URIs for loading images.

raises

ValueError

If both embeddings and documents/images are missing
If lengths of provided fields do not match
If an ID already exists

Either provide embeddings OR provide documents/images to be embedded. You cannot provide both.

get()

Retrieve records from the collection.

results = collection.get(
    ids=["id1", "id2"],
    include=["documents", "metadatas"]
)

ids

List[str]

If provided, only return records with these IDs.

where

Where

A Where filter used to filter based on metadata values.

limit

int

Maximum number of results to return.

offset

int

Number of results to skip before returning.

where_document

WhereDocument

A WhereDocument filter used to filter based on document content.

include

List[str]

default:"[\"metadatas\", \"documents\"]"

Fields to include in results. Can contain "embeddings", "metadatas", "documents", "uris".

return

GetResult

Retrieved records and requested fields as a GetResult object with:

ids: List of IDs
documents: Optional list of documents
metadatas: Optional list of metadata
embeddings: Optional list of embeddings
uris: Optional list of URIs

query()

Query for the K nearest neighbor records.

results = collection.query(
    query_embeddings=[[0.1, 0.2, 0.3]],
    n_results=10
)

query_embeddings

List[float] | List[List[float]]

Raw embeddings to query for.

query_texts

str | List[str]

Documents to embed and query against using the collection’s embedding function.

query_images

ndarray | List[ndarray]

Images to embed and query against.

query_uris

str | List[str]

URIs to be loaded and embedded.

ids

List[str]

Optional subset of IDs to search within.

n_results

int

default:"10"

Number of neighbors to return per query.

where

Where

Metadata filter.

where_document

WhereDocument

Document content filter.

include

List[str]

default:"[\"metadatas\", \"documents\", \"distances\"]"

Fields to include in results. Can contain "embeddings", "metadatas", "documents", "uris", "distances".

return

QueryResult

Nearest neighbor results in batched format:

ids: List of lists of IDs (one list per query)
documents: Optional list of lists of documents
metadatas: Optional list of lists of metadata
embeddings: Optional list of lists of embeddings
distances: Optional list of lists of distances

raises

ValueError

If no query input is provided
If multiple query input types are provided

This is a batch query API. Multiple queries can be performed at once by providing lists of query inputs.

search()

Perform hybrid search on the collection using the new Search API.

from chromadb.execution.expression import Search, K, Knn

results = collection.search(
    Search()
        .where(K("category") == "science")
        .rank(Knn(query=[0.1, 0.2, 0.3]))
        .limit(10)
        .select(K.DOCUMENT, K.SCORE)
)

searches

Search | List[Search]

required

A single Search object or list of Search objects, each containing:

where: Where expression for filtering
rank: Ranking expression for hybrid search (defaults to Val(0.0))
limit: Limit configuration for pagination
select: Select configuration for keys to return

read_level

ReadLevel

default:"ReadLevel.INDEX_AND_WAL"

Controls whether to read from the write-ahead log (WAL):

ReadLevel.INDEX_AND_WAL: Read from both index and WAL (default, all writes visible)
ReadLevel.INDEX_ONLY: Read only from index, skipping WAL (faster, may miss recent writes)

return

SearchResult

Column-major format response with:

ids: List of result IDs for each search payload
documents: Optional documents for each payload
embeddings: Optional embeddings for each payload
metadatas: Optional metadata for each payload
scores: Optional scores for each payload
select: List of selected keys for each payload

Use .rows() method to convert to row-major format.

This is an experimental API for distributed and hosted Chroma. See Search API for more details.

update()

Update existing records by ID.

collection.update(
    ids=["id1", "id2"],
    documents=["updated doc1", "updated doc2"]
)

ids

str | List[str]

required

Record IDs to update.

embeddings

List[float] | List[List[float]]

Updated embeddings. If None, embeddings are computed from documents/images.

metadatas

Dict | List[Dict]

Updated metadata.

documents

str | List[str]

Updated documents.

images

ndarray | List[ndarray]

Updated images.

uris

str | List[str]

Updated URIs for loading images.

upsert()

Create or update records by ID.

collection.upsert(
    ids=["id1", "id2", "id3"],
    documents=["doc1", "doc2", "doc3"],
    metadatas=[{"new": True}, {"new": True}, {"new": False}]
)

ids

str | List[str]

required

Record IDs to upsert.

embeddings

List[float] | List[List[float]]

Embeddings to add or update. If None, embeddings are computed.

metadatas

Dict | List[Dict]

Metadata to add or update.

documents

str | List[str]

Documents to add or update.

images

ndarray | List[ndarray]

Images to add or update.

uris

str | List[str]

URIs for loading images.

delete()

Delete records by ID or filters.

collection.delete(ids=["id1", "id2"])

ids

List[str]

Record IDs to delete.

where

Where

Metadata filter.

where_document

WhereDocument

Document content filter.

raises

ValueError

If no IDs or filters are provided.

All documents that match the filters will be permanently deleted.

Collection Utilities

count()

Return the number of records in the collection.

num_records = collection.count()
print(f"Collection has {num_records} records")

return

int

The number of records in the collection.

peek()

Return the first N records from the collection.

results = collection.peek(limit=5)

limit

int

default:"10"

Maximum number of records to return.

return

GetResult

Retrieved records with IDs, documents, metadatas, and embeddings.

get_indexing_status()

Get the indexing status of the collection.

status = collection.get_indexing_status()
print(f"Indexed: {status.num_indexed_ops}/{status.total_ops}")
print(f"Progress: {status.op_indexing_progress:.2%}")

return

IndexingStatus

An object containing:

num_indexed_ops: Number of user operations that have been indexed
num_unindexed_ops: Number of user operations pending indexing
total_ops: Total number of user operations in collection
op_indexing_progress: Proportion of operations indexed (0.0 to 1.0)

modify()

Update collection name, metadata, or configuration.

collection.modify(name="new_collection_name")

name

str

New collection name.

metadata

CollectionMetadata

New metadata for the collection.

configuration

UpdateCollectionConfiguration

New configuration for the collection.

fork()

Fork the current collection under a new name with identical data.

This is an experimental API that only works for Hosted Chroma.

new_collection = collection.fork(new_name="my_collection_backup")

Parameters:

new_name

str

required

The name of the new collection.

Returns:

return

Collection

A new collection with the specified name containing identical data to the current collection.

attach_function()

Attach a function to this collection for automated processing.

from chromadb.api.functions import Function

attached_fn, created = collection.attach_function(
    function=Function.STATISTICS_FUNCTION,
    name="mycoll_stats_fn",
    output_collection="mycoll_stats"
)

if created:
    print("New function attached")
else:
    print("Function already existed")

Parameters:

function

Function

required

A Function enum value (e.g., STATISTICS_FUNCTION, RECORD_COUNTER_FUNCTION)

name

str

required

Unique name for this attached function

output_collection

str

required

Name of the collection where function output will be stored

params

Dict[str, Any]

Optional dictionary with function-specific parameters

Returns:

return

Tuple[AttachedFunction, bool]

Tuple of (AttachedFunction, created) where created is True if newly created, False if already existed

get_attached_function()

Get an attached function by name for this collection.

attached_fn = collection.get_attached_function(name="mycoll_stats_fn")

Parameters:

name

str

required

Name of the attached function

Returns:

return

AttachedFunction

The attached function object

Raises:

NotFoundError: If the attached function doesn’t exist

detach_function()

Detach a function from this collection.

# Detach function but keep output collection
success = collection.detach_function(name="mycoll_stats_fn")

# Detach function and delete output collection
success = collection.detach_function(
    name="mycoll_stats_fn",
    delete_output_collection=True
)

Parameters:

name

str

required

The name of the attached function

delete_output_collection

bool

default:"False"

Whether to also delete the output collection

Returns:

return

bool

True if successful

Collection Properties

name

Get the collection name.

print(collection.name)

id

Get the collection UUID.

print(collection.id)

metadata

Get the collection metadata.

print(collection.metadata)

Python Client

JavaScript Client

Go Client

CLI

Collection API

Document Operations

add()

get()

query()

search()

update()

upsert()

delete()

Collection Utilities

count()

peek()

get_indexing_status()

modify()

fork()

attach_function()

get_attached_function()

detach_function()

Collection Properties

name

id

metadata

Build docs developers (and LLMs) love

Python Client

JavaScript Client

Go Client

CLI

Documentation Index

​Document Operations

​add()

​get()

​query()

​search()

​update()

​upsert()

​delete()

​Collection Utilities

​count()

​peek()

​get_indexing_status()

​modify()

​fork()

​attach_function()

​get_attached_function()

​detach_function()

​Collection Properties

​name

​id

​metadata

Build docs developers (and LLMs) love

Document Operations

add()

get()

query()

search()

update()

upsert()

delete()

Collection Utilities

count()

peek()

get_indexing_status()

modify()

fork()

attach_function()

get_attached_function()

detach_function()

Collection Properties

name

id

metadata