LLM provides a suite of command-line utilities for computing and storing embedding vectors, managing named collections in SQLite, and performing semantic similarity searches — all without writing any Python code. This page documents every embedding-related command and its available options.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/simonw/LLM/llms.txt
Use this file to discover all available pages before exploring further.
llm embed
Thellm embed command calculates an embedding vector for a single piece of content. The result can be printed directly to the terminal, persisted into a named collection in a SQLite database, or both.
Returning embeddings to the terminal
Pass content with-c/--content and specify an embedding model with -m/--model:
-m 3-small targets the OpenAI text-embedding-3-small model. You must have configured an OpenAI API key with llm keys set openai for this to work.
The command outputs a JSON array of floats:
Output formats
The default output is a JSON array. Use--format to switch to the compact binary representation described in Storage Format:
Binary input
Models such as llm-clip can embed binary files like images. Pass--binary together with -i <file>:
Storing embeddings in a collection
Embeddings are most useful when stored so you can run similarity queries later. LLM organises stored embeddings into collections — named groups inside a SQLite database where every entry has a unique ID and was created with the same embedding model. Store an embedding in thequotations collection under the key philkarlton-1:
embeddings.db in LLM’s user content directory. Find that path with:
-d/--database to write to a different SQLite file (it will be created if it does not exist):
Storing content and metadata
By default only the entry ID and embedding vector are stored. Add--store to persist the original text in the content column:
--metadata to attach arbitrary JSON metadata to the entry:
llm similar:
llm embed-multi
llm embed-multi embeds multiple items in one command, exploiting any batch-processing efficiencies offered by the embedding model. It can read from CSV/TSV/JSON files, a SQLite database query, or directories of files.
All input modes support these shared options:
| Option | Description |
|---|---|
-m/--model model_id | Embedding model to use |
-d/--database database.db | Target database file |
--store | Persist original content alongside the vector |
--prefix PREFIX | Prepend a string to every stored ID |
--prepend TEXT | Prepend a string to every piece of content before embedding |
--batch-size N | Process embeddings in batches of N items |
--format FORMAT | Input format: json, csv, tsv, or nl (newline-delimited JSON) |
--attach ALIAS PATH | Attach an additional SQLite database under the given alias |
The
--prepend option is useful for models that require a special token. For example, nomic-embed-text-v2-moe requires documents to be prefixed with search_document: and queries with search_query: .Embedding data from a CSV, TSV, or JSON file
Your file must have at least two columns: the first is treated as the entry ID, and subsequent columns are concatenated as the content to embed.-. For newline-delimited JSON you must pass --format nl:
Embedding data from a SQLite database
Use--sql to embed results from a SQL query. When the source and destination are the same database:
--attach:
Embedding data from files in directories
Use--files <directory> <glob> to embed all matching files in a directory tree. The file’s relative path is used as its ID:
aliases.md, embeddings/cli.md, plugins/index.md, and so on.
Add --prefix to namespace the IDs:
latin-1. Specify alternative encodings with --encoding (tried in order):
--binary:
llm similar
llm similar searches a named collection for entries that are semantically closest to a given query, ranked by cosine similarity.
Search the quotations collection for entries similar to 'computer science':
-p/--plain to get a compact plain-text listing instead:
-i <filename>, or from standard input with -i -:
--binary:
--prefix:
-n N to control how many results are returned (default is 10):
llm embed-models
List all embedding models that are currently available, including those provided by installed plugins:-q (can be repeated):
Setting the default embedding model
Get the current default:-m/--model becomes required:
llm collections list
List all collections stored in the embeddings database:--json for machine-readable output:
-d/--database:
llm collections delete
Delete a collection and all of its embeddings:-d to target a specific database file: