Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/simonw/LLM/llms.txt

Use this file to discover all available pages before exploring further.

LLM provides a suite of command-line utilities for computing and storing embedding vectors, managing named collections in SQLite, and performing semantic similarity searches — all without writing any Python code. This page documents every embedding-related command and its available options.

llm embed

The llm embed command calculates an embedding vector for a single piece of content. The result can be printed directly to the terminal, persisted into a named collection in a SQLite database, or both.

Returning embeddings to the terminal

Pass content with -c/--content and specify an embedding model with -m/--model:
llm embed -c 'This is some content' -m 3-small
-m 3-small targets the OpenAI text-embedding-3-small model. You must have configured an OpenAI API key with llm keys set openai for this to work. The command outputs a JSON array of floats:
[0.123, 0.456, 0.789, ...]
You can also use community plugins to run models locally. For example, llm-sentence-transformers lets you run models on your own machine:
llm install llm-sentence-transformers
llm embed -c 'This is some content' -m sentence-transformers/all-MiniLM-L6-v2
Set a default embedding model once so you do not have to repeat -m on every call:
llm embed-models default 3-small
Or export an environment variable to set a default for the current shell session:
export LLM_EMBEDDING_MODEL=3-small
llm embed -c 'This is some content'

Output formats

The default output is a JSON array. Use --format to switch to the compact binary representation described in Storage Format:
# Raw binary bytes
llm embed -c 'This is some content' -m 3-small --format blob

# Hexadecimal string
llm embed -c 'This is some content' -m 3-small --format hex

# Base64 string
llm embed -c 'This is some content' -m 3-small --format base64

Binary input

Models such as llm-clip can embed binary files like images. Pass --binary together with -i <file>:
llm embed --binary -m clip -i image.jpg
Or pipe binary data from standard input:
cat image.jpg | llm embed --binary -m clip -i -

Storing embeddings in a collection

Embeddings are most useful when stored so you can run similarity queries later. LLM organises stored embeddings into collections — named groups inside a SQLite database where every entry has a unique ID and was created with the same embedding model. Store an embedding in the quotations collection under the key philkarlton-1:
llm embed quotations philkarlton-1 -c \
  'There are only two hard things in Computer Science: cache invalidation and naming things'
Pipe content from a file:
cat one.txt | llm embed files one
A collection is created automatically the first time you mention it. Its embedding model is fixed to whichever model was used for the first entry. By default, embeddings are written to embeddings.db in LLM’s user content directory. Find that path with:
llm collections path
Use -d/--database to write to a different SQLite file (it will be created if it does not exist):
llm embed phrases hound -d my-embeddings.db -c 'my happy hound'

Storing content and metadata

By default only the entry ID and embedding vector are stored. Add --store to persist the original text in the content column:
llm embed phrases hound -c 'my happy hound' --store
Add --metadata to attach arbitrary JSON metadata to the entry:
llm embed phrases hound \
  -m 3-small \
  -c 'my happy hound' \
  --metadata '{"name": "Hound"}' \
  --store
Stored content and metadata are returned by llm similar:
llm similar phrases -c 'hound'
{"id": "hound", "score": 0.8484683588631485, "content": "my happy hound", "metadata": {"name": "Hound"}}

llm embed-multi

llm embed-multi embeds multiple items in one command, exploiting any batch-processing efficiencies offered by the embedding model. It can read from CSV/TSV/JSON files, a SQLite database query, or directories of files. All input modes support these shared options:
OptionDescription
-m/--model model_idEmbedding model to use
-d/--database database.dbTarget database file
--storePersist original content alongside the vector
--prefix PREFIXPrepend a string to every stored ID
--prepend TEXTPrepend a string to every piece of content before embedding
--batch-size NProcess embeddings in batches of N items
--format FORMATInput format: json, csv, tsv, or nl (newline-delimited JSON)
--attach ALIAS PATHAttach an additional SQLite database under the given alias
The --prepend option is useful for models that require a special token. For example, nomic-embed-text-v2-moe requires documents to be prefixed with search_document: and queries with search_query: .

Embedding data from a CSV, TSV, or JSON file

Your file must have at least two columns: the first is treated as the entry ID, and subsequent columns are concatenated as the content to embed.
id,content
one,This is the first item
two,This is the second item
Pass the file as the second argument after the collection name:
llm embed-multi items mydata.csv
Pipe from standard input using -. For newline-delimited JSON you must pass --format nl:
cat mydata.json | llm embed-multi items - --format nl
A full example combining several options:
llm embed-multi items mydata.json \
  -d docs.db \
  -m 3-small \
  --prefix my-items/ \
  --store

Embedding data from a SQLite database

Use --sql to embed results from a SQL query. When the source and destination are the same database:
llm embed-multi docs \
  -d docs.db \
  --sql 'select id, title, content from documents' \
  -m 3-small
To read from a different database, attach it with --attach:
llm embed-multi docs \
  -d embeddings.db \
  --attach other other.db \
  --sql 'select id, title, content from other.documents' \
  -m 3-small

Embedding data from files in directories

Use --files <directory> <glob> to embed all matching files in a directory tree. The file’s relative path is used as its ID:
llm embed-multi documentation \
  -m 3-small \
  --files docs '**/*.md' \
  -d documentation.db \
  --store
This produces IDs such as aliases.md, embeddings/cli.md, plugins/index.md, and so on. Add --prefix to namespace the IDs:
llm embed-multi documentation \
  -m 3-small \
  --files docs '**/*.md' \
  -d documentation.db \
  --store \
  --prefix llm-docs/
Files are read as UTF-8 by default, with a fallback to latin-1. Specify alternative encodings with --encoding (tried in order):
llm embed-multi documentation \
  -m 3-small \
  --files docs '**/*.md' \
  -d documentation.db \
  --encoding utf-16 \
  --encoding mac_roman \
  --encoding latin-1
For binary content such as images, add --binary:
llm embed-multi photos \
  -m clip \
  --files photos/ '*.jpeg' --binary

llm similar

llm similar searches a named collection for entries that are semantically closest to a given query, ranked by cosine similarity.
Similarity search currently uses a brute-force approach that computes scores against every document in the collection. This works well for small-to-medium collections but does not scale to very large ones. See issue 216 for plans to support vector index plugins.
Search the quotations collection for entries similar to 'computer science':
llm similar quotations -c 'computer science'
Results are returned as newline-delimited JSON:
{"id": "philkarlton-1", "score": 0.8323904531677017, "content": null, "metadata": null}
Use -p/--plain to get a compact plain-text listing instead:
llm similar quotations -c 'computer science' -p
philkarlton-1 (0.8323904531677017)
Read query content from a file with -i <filename>, or from standard input with -i -:
llm similar quotations -i one.txt
echo 'computer science' | llm similar quotations -i -
For binary models like CLIP, pass --binary:
llm similar photos -i image.jpg --binary
Filter results to only IDs that begin with a given prefix using --prefix:
llm similar quotations --prefix 'movies/' -c 'star wars'
Use -n N to control how many results are returned (default is 10):
llm similar quotations -c 'computer science' -n 5

llm embed-models

List all embedding models that are currently available, including those provided by installed plugins:
llm embed-models
OpenAIEmbeddingModel: text-embedding-ada-002 (aliases: ada, ada-002)
OpenAIEmbeddingModel: text-embedding-3-small (aliases: 3-small)
OpenAIEmbeddingModel: text-embedding-3-large (aliases: 3-large)
...
Filter results with -q (can be repeated):
llm embed-models -q 3-small

Setting the default embedding model

Get the current default:
llm embed-models default
Set a new default (model aliases are accepted):
llm embed-models default 3-small
Remove the default so that -m/--model becomes required:
llm embed-models default --remove-default

llm collections list

List all collections stored in the embeddings database:
llm collections list
Add --json for machine-readable output:
llm collections list --json
Specify a different database file with -d/--database:
llm collections list -d my-embeddings.db

llm collections delete

Delete a collection and all of its embeddings:
llm collections delete collection-name
Pass -d to target a specific database file:
llm collections delete collection-name -d my-embeddings.db

Build docs developers (and LLMs) love