LLM exposes a complete Python API for working with embeddings. You can load an embedding model, compute vectors for individual strings or binary data, and persist those vectors in named collections backed by SQLite. Collections support bulk ingestion, deduplication via content hashing, and cosine-similarity search — all without running the CLI.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/simonw/LLM/llms.txt
Use this file to discover all available pages before exploring further.
Loading a model
Usellm.get_embedding_model() with a model ID or alias:
Embedding a string
Call.embed() on the model to compute a vector for a single string. This returns a Python list of floating-point numbers:
Binary input
Some models — such as CLIP — can embed raw binary data (e.g. images). Checksupports_binary before passing bytes:
embedding_model.supports_text property similarly indicates whether text input is supported.
Embedding multiple strings at once
Many models are more efficient when processing a batch of inputs. Use.embed_multi(), which returns a generator of vectors:
batch_size=N:
Working with collections
Thellm.Collection class groups related embeddings in a SQLite database under a single name. Every entry in a collection has a unique string ID and was created with the same embedding model.
Creating a collection
If the named collection already exists in the database, you can omit
model and model_id — the model ID is read from the collections table automatically.create=False and the collection does not exist, a Collection.DoesNotExist exception is raised:
Embedding a single item
"my happy hound" under the key "hound". If an entry with the same content hash already exists, the call is a no-op (deduplication is automatic).
Add store=True to also persist the original text in the content column:
metadata=:
Storing embeddings in bulk
collection.embed_multi() accepts an iterable of (id, text) tuples and is more efficient than calling embed() in a loop:
embed_multi_with_metadata() with (id, text, metadata) tuples:
batch_size parameter (default 100). Reduce it if you run into memory issues with large collections:
Retrieving similar items
Once a collection is populated, usesimilar() to find the entries whose embedding vectors are closest (by cosine similarity) to a query string:
entry is an Entry dataclass with four fields:
| Field | Type | Description |
|---|---|---|
id | str | Unique string ID of the item |
score | float | Cosine similarity score (higher = more similar) |
content | str | None | Original text, if stored with store=True |
metadata | dict | None | Metadata dict, if provided at embed time |
number=:
Searching by existing ID
similar_by_id() looks up the stored vector for an existing entry and returns the most similar neighbours, excluding the item itself:
Searching by raw vector
similar_by_vector() accepts a raw list of floats, which is useful when you already have a pre-computed vector:
skip_id= to exclude a specific entry from the results.
Collection class reference
ACollection instance exposes the following properties and methods:
| Member | Description |
|---|---|
id | Integer primary key of the collection row |
name | Unique string name of the collection |
model_id | String ID of the embedding model used |
model() | Returns the live EmbeddingModel instance |
count() | Returns the number of items in the collection |
embed(id, value, metadata=None, store=False) | Embed and store a single item |
embed_multi(entries, store=False, batch_size=100) | Embed and store multiple (id, text) pairs |
embed_multi_with_metadata(entries, store=False, batch_size=100) | Embed and store multiple (id, text, metadata) triples |
similar(value, number=10) | Find nearest neighbours by query string |
similar_by_id(id, number=10) | Find nearest neighbours by stored ID |
similar_by_vector(vector, number=10, skip_id=None) | Find nearest neighbours by raw float vector |
delete() | Delete the collection and all its embeddings |
Collection.exists() class method to check for a collection’s existence before working with it:
SQL schema
The embeddings database contains two tables:collections.modelstores the full model ID (e.g.text-embedding-3-small).embeddings.embeddingis a binary blob of little-endian 32-bit floats. See Storage Format for encode/decode details.embeddings.content_hashis an MD5 digest of the original content, used to skip re-embedding identical inputs.embeddings.metadatais a JSON string, orNULLif no metadata was provided.