LLM stores embedding vectors in a space-efficient binary format inside a SQLite database. Rather than saving a verbose JSON array of floating-point numbers, each vector is serialised as a sequence of little-endian 32-bit floats packed back-to-back — 4 bytes per dimension. This blob is written directly into aDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/simonw/LLM/llms.txt
Use this file to discover all available pages before exploring further.
BLOB column in the embeddings table. Understanding this format is useful if you want to read LLM’s database directly, integrate with other tools, or implement your own embedding storage.
Binary encoding
The default output ofllm embed to the terminal is a human-readable JSON array:
--format blob, --format hex, or --format base64 — LLM uses a compact binary representation: a little-endian sequence of 32-bit (single-precision) floats, one per embedding dimension.
encode / decode helpers
The following Python functions convert between a list of floats and the binary blob format. They are available asllm.encode() and llm.decode() in the public API:
"<" is the struct format character for little-endian byte order, and "f" denotes a 4-byte (32-bit) float. Each call to encode() produces exactly 4 × len(values) bytes; decode() reverses the process.
Using the helpers
32-bit floats have less precision than Python’s native 64-bit floats, so you will see small rounding differences after a round-trip through
encode → decode.Decoding with NumPy
If you are working with NumPy, you can decode a blob directly withoutstruct:
"<f4" dtype string tells NumPy to interpret the buffer as little-endian 32-bit floats — exactly the format LLM writes.
SQLite schema
LLM stores all embeddings in a single SQLite database (by defaultembeddings.db in the LLM data directory, or a path you supply with -d). The database contains two tables:
The collections table
Each row represents a named collection of embeddings.
| Column | Type | Description |
|---|---|---|
id | INTEGER | Auto-incremented primary key |
name | TEXT | Unique name of the collection (e.g. "quotations") |
model | TEXT | Full model ID used for all embeddings in this collection (e.g. "text-embedding-3-small") |
The embeddings table
Each row stores one embedding vector alongside its metadata.
| Column | Type | Description |
|---|---|---|
collection_id | INTEGER | Foreign key → collections.id |
id | TEXT | Caller-supplied string identifier for this entry |
embedding | BLOB | The vector encoded as little-endian 32-bit floats |
content | TEXT | Original text content, if stored with --store (or None) |
content_blob | BLOB | Original binary content, if stored with --store (or None) |
content_hash | BLOB | MD5 digest of the original content — used to skip re-embedding identical inputs |
metadata | TEXT | JSON string of arbitrary metadata, or NULL |
updated | INTEGER | Unix timestamp of when this row was last written |
(collection_id, id) ensures IDs are unique within a collection but can be reused across different collections.
Content hashing and deduplication
Before computing a new embedding, LLM calculates an MD5 hash of the input content:collection_id and content_hash already exists in the embeddings table, the embedding call is skipped entirely. This avoids wasting API quota or compute time when the same content is submitted more than once.
Metadata storage
Themetadata column stores a JSON-serialised dictionary. When you embed an item with metadata:
collection.similar() or a direct SQL query), the JSON string is parsed back into a Python dictionary automatically by the Collection class.