Document Operations
add()
Add records to the collection.Record IDs to add. Must be unique within the collection.
Embeddings to add. If None, embeddings are computed from documents/images using the collection’s embedding function.
Optional metadata for each record.
Optional documents for each record.
Optional images for each record.
Optional URIs for loading images.
- If both embeddings and documents/images are missing
- If lengths of provided fields do not match
- If an ID already exists
Either provide
embeddings OR provide documents/images to be embedded. You cannot provide both.get()
Retrieve records from the collection.If provided, only return records with these IDs.
A Where filter used to filter based on metadata values.
Maximum number of results to return.
Number of results to skip before returning.
A WhereDocument filter used to filter based on document content.
Fields to include in results. Can contain
"embeddings", "metadatas", "documents", "uris".Retrieved records and requested fields as a GetResult object with:
ids: List of IDsdocuments: Optional list of documentsmetadatas: Optional list of metadataembeddings: Optional list of embeddingsuris: Optional list of URIs
query()
Query for the K nearest neighbor records.Raw embeddings to query for.
Documents to embed and query against using the collection’s embedding function.
Images to embed and query against.
URIs to be loaded and embedded.
Optional subset of IDs to search within.
Number of neighbors to return per query.
Metadata filter.
Document content filter.
Fields to include in results. Can contain
"embeddings", "metadatas", "documents", "uris", "distances".Nearest neighbor results in batched format:
ids: List of lists of IDs (one list per query)documents: Optional list of lists of documentsmetadatas: Optional list of lists of metadataembeddings: Optional list of lists of embeddingsdistances: Optional list of lists of distances
- If no query input is provided
- If multiple query input types are provided
This is a batch query API. Multiple queries can be performed at once by providing lists of query inputs.
search()
Perform hybrid search on the collection using the new Search API.A single Search object or list of Search objects, each containing:
where: Where expression for filteringrank: Ranking expression for hybrid search (defaults to Val(0.0))limit: Limit configuration for paginationselect: Select configuration for keys to return
Controls whether to read from the write-ahead log (WAL):
ReadLevel.INDEX_AND_WAL: Read from both index and WAL (default, all writes visible)ReadLevel.INDEX_ONLY: Read only from index, skipping WAL (faster, may miss recent writes)
Column-major format response with:
ids: List of result IDs for each search payloaddocuments: Optional documents for each payloadembeddings: Optional embeddings for each payloadmetadatas: Optional metadata for each payloadscores: Optional scores for each payloadselect: List of selected keys for each payload
.rows() method to convert to row-major format.update()
Update existing records by ID.Record IDs to update.
Updated embeddings. If None, embeddings are computed from documents/images.
Updated metadata.
Updated documents.
Updated images.
Updated URIs for loading images.
upsert()
Create or update records by ID.Record IDs to upsert.
Embeddings to add or update. If None, embeddings are computed.
Metadata to add or update.
Documents to add or update.
Images to add or update.
URIs for loading images.
delete()
Delete records by ID or filters.Record IDs to delete.
Metadata filter.
Document content filter.
If no IDs or filters are provided.
Collection Utilities
count()
Return the number of records in the collection.The number of records in the collection.
peek()
Return the first N records from the collection.Maximum number of records to return.
Retrieved records with IDs, documents, metadatas, and embeddings.
get_indexing_status()
Get the indexing status of the collection.An object containing:
num_indexed_ops: Number of user operations that have been indexednum_unindexed_ops: Number of user operations pending indexingtotal_ops: Total number of user operations in collectionop_indexing_progress: Proportion of operations indexed (0.0 to 1.0)
modify()
Update collection name, metadata, or configuration.New collection name.
New metadata for the collection.
New configuration for the collection.
fork()
Fork the current collection under a new name with identical data.The name of the new collection.
A new collection with the specified name containing identical data to the current collection.
attach_function()
Attach a function to this collection for automated processing.A Function enum value (e.g.,
STATISTICS_FUNCTION, RECORD_COUNTER_FUNCTION)Unique name for this attached function
Name of the collection where function output will be stored
Optional dictionary with function-specific parameters
Tuple of (AttachedFunction, created) where created is True if newly created, False if already existed
get_attached_function()
Get an attached function by name for this collection.Name of the attached function
The attached function object
NotFoundError: If the attached function doesn’t exist
detach_function()
Detach a function from this collection.The name of the attached function
Whether to also delete the output collection
True if successful