LLM’s embedding system is fully extensible through plugins. Any Python library that produces vector embeddings — sentence transformers, CLIP, custom models — can be wrapped in a small plugin and made available to theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/simonw/LLM/llms.txt
Use this file to discover all available pages before exploring further.
llm embed command and the Python API. This page walks through every component you need to write an embedding model plugin, from the class interface to binary content support and batching.
The EmbeddingModel class
Your embedding model must extendllm.EmbeddingModel and implement the embed_batch() method:
model_id— the identifier users pass to-m(e.g.llm embed -m your-model-id).embed_batch(items)— takes an iterable of strings (or bytes ifsupports_binary = True) and returns an iterator over lists of floats, one list per input item.supports_text— whether the model accepts text input (defaultTrue).supports_binary— whether the model accepts binary input (defaultFalse).batch_size— if set,embed_multi()automatically chunks large inputs into batches of this size before callingembed_batch().
register_embedding_models hook
Use theregister_embedding_models hook to register your model with LLM:
aliases= tuple to give the model one or more short names:
A complete example plugin
The following plugin wraps the sentence-transformers library to provide the all-MiniLM-L6-v2 model. Notice that theSentenceTransformer model is loaded lazily inside embed_batch() rather than at __init__ time — this avoids the startup cost on every llm invocation:
Embedding binary content
Models that accept binary input — images, audio, and similar — setsupports_binary = True. Models that accept both text and binary set both flags:
supports_text defaults to True, so you only need to set it explicitly when your model rejects text entirely.
When supports_binary = True, your embed_batch() method may receive a list containing Python bytes objects (for binary items), strings (for text items), or a mix of both if your model supports both types:
Setting batch_size
If your embedding API or library has an optimal batch size, declare it as a class attribute. LLM’sembed_multi() method will then automatically split large inputs into chunks of that size before calling embed_batch():
batch_size = None (the default) passes all items to embed_batch() in a single call.
Real-world examples
llm-sentence-transformers
llm-sentence-transformers
llm-sentence-transformers is a complete plugin that wraps the sentence-transformers library and supports many different pre-trained models. It demonstrates lazy loading, configurable model selection, and alias registration.
llm-clip
llm-clip
llm-clip embeds both text and images using OpenAI’s CLIP model. It demonstrates the
supports_binary = True pattern and embedding images directly from file paths.llm-embed-jina
llm-embed-jina
Execute Jina embeddings with a CLI using llm-embed-jina is a tutorial that walks through building a plugin for the Jina embeddings API, including API key handling.