LLM Embedding Storage: Binary Format and SQLite Schema

LLM stores embedding vectors in a space-efficient binary format inside a SQLite database. Rather than saving a verbose JSON array of floating-point numbers, each vector is serialised as a sequence of little-endian 32-bit floats packed back-to-back — 4 bytes per dimension. This blob is written directly into a BLOB column in the embeddings table. Understanding this format is useful if you want to read LLM’s database directly, integrate with other tools, or implement your own embedding storage.

Binary encoding

The default output of llm embed to the terminal is a human-readable JSON array:

[0.123, 0.456, 0.789, ...]

Internally — and when you request --format blob, --format hex, or --format base64 — LLM uses a compact binary representation: a little-endian sequence of 32-bit (single-precision) floats, one per embedding dimension.

encode / decode helpers

The following Python functions convert between a list of floats and the binary blob format. They are available as llm.encode() and llm.decode() in the public API:

import struct

def encode(values):
    return struct.pack("<" + "f" * len(values), *values)

def decode(binary):
    return struct.unpack("<" + "f" * (len(binary) // 4), binary)

"<" is the struct format character for little-endian byte order, and "f" denotes a 4-byte (32-bit) float. Each call to encode() produces exactly 4 × len(values) bytes; decode() reverses the process.

Using the helpers

import llm

# Encode a list of floats → bytes
blob = llm.encode([0.1, 0.2, 0.3])

# Decode bytes → tuple of floats
vector = llm.decode(blob)
print(vector)  # (0.10000000149011612, 0.20000000298023224, 0.30000001192092896)

32-bit floats have less precision than Python’s native 64-bit floats, so you will see small rounding differences after a round-trip through encode → decode.

Decoding with NumPy

If you are working with NumPy, you can decode a blob directly without struct:

import numpy as np

numpy_array = np.frombuffer(blob, "<f4")

The "<f4" dtype string tells NumPy to interpret the buffer as little-endian 32-bit floats — exactly the format LLM writes.

SQLite schema

LLM stores all embeddings in a single SQLite database (by default embeddings.db in the LLM data directory, or a path you supply with -d). The database contains two tables:

CREATE TABLE [collections] (
   [id] INTEGER PRIMARY KEY,
   [name] TEXT,
   [model] TEXT
)
CREATE TABLE "embeddings" (
   [collection_id] INTEGER REFERENCES [collections]([id]),
   [id] TEXT,
   [embedding] BLOB,
   [content] TEXT,
   [content_blob] BLOB,
   [content_hash] BLOB,
   [metadata] TEXT,
   [updated] INTEGER,
   PRIMARY KEY ([collection_id], [id])
)

The `collections` table

Each row represents a named collection of embeddings.

Column	Type	Description
`id`	`INTEGER`	Auto-incremented primary key
`name`	`TEXT`	Unique name of the collection (e.g. `"quotations"`)
`model`	`TEXT`	Full model ID used for all embeddings in this collection (e.g. `"text-embedding-3-small"`)

A collection’s model is fixed when its first embedding is inserted. Every subsequent entry must use the same model.

The `embeddings` table

Each row stores one embedding vector alongside its metadata.

Column	Type	Description
`collection_id`	`INTEGER`	Foreign key → `collections.id`
`id`	`TEXT`	Caller-supplied string identifier for this entry
`embedding`	`BLOB`	The vector encoded as little-endian 32-bit floats
`content`	`TEXT`	Original text content, if stored with `--store` (or `None`)
`content_blob`	`BLOB`	Original binary content, if stored with `--store` (or `None`)
`content_hash`	`BLOB`	MD5 digest of the original content — used to skip re-embedding identical inputs
`metadata`	`TEXT`	JSON string of arbitrary metadata, or `NULL`
`updated`	`INTEGER`	Unix timestamp of when this row was last written

The composite primary key (collection_id, id) ensures IDs are unique within a collection but can be reused across different collections.

Content hashing and deduplication

Before computing a new embedding, LLM calculates an MD5 hash of the input content:

import hashlib

def content_hash(input):
    if isinstance(input, str):
        input = input.encode("utf-8")
    return hashlib.md5(input).digest()

If a row with the same collection_id and content_hash already exists in the embeddings table, the embedding call is skipped entirely. This avoids wasting API quota or compute time when the same content is submitted more than once.

Metadata storage

The metadata column stores a JSON-serialised dictionary. When you embed an item with metadata:

collection.embed(
    "hound",
    "my happy hound",
    metadata={"name": "Hound", "score": 9.5},
    store=True,
)

The metadata is written as:

{"name": "Hound", "score": 9.5}

When reading entries back (via collection.similar() or a direct SQL query), the JSON string is parsed back into a Python dictionary automatically by the Collection class.

You can query the embeddings database directly with any SQLite tool — Datasette, the sqlite3 CLI, or sqlite-utils — without going through the LLM Python API. Just use llm.decode() (or the equivalent struct.unpack) to turn the embedding blob back into a list of floats for any custom analysis.

Get Started

Using LLM

Advanced Features

Embeddings

Plugins

LLM Embedding Storage: Binary Format and SQLite Schema

Binary encoding

encode / decode helpers

Using the helpers

Decoding with NumPy

SQLite schema

The `collections` table

The `embeddings` table

Content hashing and deduplication

Metadata storage

Build docs developers (and LLMs) love

Get Started

Using LLM

Advanced Features

Embeddings

Plugins

Documentation Index

​Binary encoding

​encode / decode helpers

​Using the helpers

​Decoding with NumPy

​SQLite schema

​The collections table

​The embeddings table

​Content hashing and deduplication

​Metadata storage

Build docs developers (and LLMs) love

Binary encoding

encode / decode helpers

Using the helpers

Decoding with NumPy

SQLite schema

The `collections` table

The `embeddings` table

Content hashing and deduplication

Metadata storage