Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/simonw/LLM/llms.txt

Use this file to discover all available pages before exploring further.

LLM stores embedding vectors in a space-efficient binary format inside a SQLite database. Rather than saving a verbose JSON array of floating-point numbers, each vector is serialised as a sequence of little-endian 32-bit floats packed back-to-back — 4 bytes per dimension. This blob is written directly into a BLOB column in the embeddings table. Understanding this format is useful if you want to read LLM’s database directly, integrate with other tools, or implement your own embedding storage.

Binary encoding

The default output of llm embed to the terminal is a human-readable JSON array:
[0.123, 0.456, 0.789, ...]
Internally — and when you request --format blob, --format hex, or --format base64 — LLM uses a compact binary representation: a little-endian sequence of 32-bit (single-precision) floats, one per embedding dimension.

encode / decode helpers

The following Python functions convert between a list of floats and the binary blob format. They are available as llm.encode() and llm.decode() in the public API:
import struct

def encode(values):
    return struct.pack("<" + "f" * len(values), *values)

def decode(binary):
    return struct.unpack("<" + "f" * (len(binary) // 4), binary)
"<" is the struct format character for little-endian byte order, and "f" denotes a 4-byte (32-bit) float. Each call to encode() produces exactly 4 × len(values) bytes; decode() reverses the process.

Using the helpers

import llm

# Encode a list of floats → bytes
blob = llm.encode([0.1, 0.2, 0.3])

# Decode bytes → tuple of floats
vector = llm.decode(blob)
print(vector)  # (0.10000000149011612, 0.20000000298023224, 0.30000001192092896)
32-bit floats have less precision than Python’s native 64-bit floats, so you will see small rounding differences after a round-trip through encodedecode.

Decoding with NumPy

If you are working with NumPy, you can decode a blob directly without struct:
import numpy as np

numpy_array = np.frombuffer(blob, "<f4")
The "<f4" dtype string tells NumPy to interpret the buffer as little-endian 32-bit floats — exactly the format LLM writes.

SQLite schema

LLM stores all embeddings in a single SQLite database (by default embeddings.db in the LLM data directory, or a path you supply with -d). The database contains two tables:
CREATE TABLE [collections] (
   [id] INTEGER PRIMARY KEY,
   [name] TEXT,
   [model] TEXT
)
CREATE TABLE "embeddings" (
   [collection_id] INTEGER REFERENCES [collections]([id]),
   [id] TEXT,
   [embedding] BLOB,
   [content] TEXT,
   [content_blob] BLOB,
   [content_hash] BLOB,
   [metadata] TEXT,
   [updated] INTEGER,
   PRIMARY KEY ([collection_id], [id])
)

The collections table

Each row represents a named collection of embeddings.
ColumnTypeDescription
idINTEGERAuto-incremented primary key
nameTEXTUnique name of the collection (e.g. "quotations")
modelTEXTFull model ID used for all embeddings in this collection (e.g. "text-embedding-3-small")
A collection’s model is fixed when its first embedding is inserted. Every subsequent entry must use the same model.

The embeddings table

Each row stores one embedding vector alongside its metadata.
ColumnTypeDescription
collection_idINTEGERForeign key → collections.id
idTEXTCaller-supplied string identifier for this entry
embeddingBLOBThe vector encoded as little-endian 32-bit floats
contentTEXTOriginal text content, if stored with --store (or None)
content_blobBLOBOriginal binary content, if stored with --store (or None)
content_hashBLOBMD5 digest of the original content — used to skip re-embedding identical inputs
metadataTEXTJSON string of arbitrary metadata, or NULL
updatedINTEGERUnix timestamp of when this row was last written
The composite primary key (collection_id, id) ensures IDs are unique within a collection but can be reused across different collections.

Content hashing and deduplication

Before computing a new embedding, LLM calculates an MD5 hash of the input content:
import hashlib

def content_hash(input):
    if isinstance(input, str):
        input = input.encode("utf-8")
    return hashlib.md5(input).digest()
If a row with the same collection_id and content_hash already exists in the embeddings table, the embedding call is skipped entirely. This avoids wasting API quota or compute time when the same content is submitted more than once.

Metadata storage

The metadata column stores a JSON-serialised dictionary. When you embed an item with metadata:
collection.embed(
    "hound",
    "my happy hound",
    metadata={"name": "Hound", "score": 9.5},
    store=True,
)
The metadata is written as:
{"name": "Hound", "score": 9.5}
When reading entries back (via collection.similar() or a direct SQL query), the JSON string is parsed back into a Python dictionary automatically by the Collection class.
You can query the embeddings database directly with any SQLite tool — Datasette, the sqlite3 CLI, or sqlite-utils — without going through the LLM Python API. Just use llm.decode() (or the equivalent struct.unpack) to turn the embedding blob back into a list of floats for any custom analysis.

Build docs developers (and LLMs) love