Writing LLM Embedding Model Plugins: A Developer Guide

LLM’s embedding system is fully extensible through plugins. Any Python library that produces vector embeddings — sentence transformers, CLIP, custom models — can be wrapped in a small plugin and made available to the llm embed command and the Python API. This page walks through every component you need to write an embedding model plugin, from the class interface to binary content support and batching.

Read the Plugin Tutorial first for a walkthrough of plugin packaging and installation. The concepts (entry points, @hookimpl, llm install -e .) are the same for embedding plugins.

The EmbeddingModel class

Your embedding model must extend llm.EmbeddingModel and implement the embed_batch() method:

class llm.EmbeddingModel:
    model_id: str
    supports_text: bool = True
    supports_binary: bool = False
    batch_size: Optional[int] = None

    def embed_batch(self, items: Iterable[Union[str, bytes]]) -> Iterator[List[float]]:
        ...

model_id — the identifier users pass to -m (e.g. llm embed -m your-model-id).
embed_batch(items) — takes an iterable of strings (or bytes if supports_binary = True) and returns an iterator over lists of floats, one list per input item.
supports_text — whether the model accepts text input (default True).
supports_binary — whether the model accepts binary input (default False).
batch_size — if set, embed_multi() automatically chunks large inputs into batches of this size before calling embed_batch().

register_embedding_models hook

Use the register_embedding_models hook to register your model with LLM:

import llm

@llm.hookimpl
def register_embedding_models(register):
    register(MyEmbeddingModel())

Pass a second aliases= tuple to give the model one or more short names:

@llm.hookimpl
def register_embedding_models(register):
    model_id = "sentence-transformers/all-MiniLM-L6-v2"
    register(
        SentenceTransformerModel(model_id, model_id),
        aliases=("all-MiniLM-L6-v2",),
    )

A complete example plugin

The following plugin wraps the sentence-transformers library to provide the all-MiniLM-L6-v2 model. Notice that the SentenceTransformer model is loaded lazily inside embed_batch() rather than at __init__ time — this avoids the startup cost on every llm invocation:

import llm
from sentence_transformers import SentenceTransformer


@llm.hookimpl
def register_embedding_models(register):
    model_id = "sentence-transformers/all-MiniLM-L6-v2"
    register(SentenceTransformerModel(model_id, model_id), aliases=("all-MiniLM-L6-v2",))


class SentenceTransformerModel(llm.EmbeddingModel):
    def __init__(self, model_id, model_name):
        self.model_id = model_id
        self.model_name = model_name
        self._model = None

    def embed_batch(self, texts):
        if self._model is None:
            self._model = SentenceTransformer(self.model_name)
        results = self._model.encode(texts)
        return (list(map(float, result)) for result in results)

Once installed, the model is accessible via its full ID or its alias:

cat file.txt | llm embed -m sentence-transformers/all-MiniLM-L6-v2
cat file.txt | llm embed -m all-MiniLM-L6-v2

Embedding binary content

Models that accept binary input — images, audio, and similar — set supports_binary = True. Models that accept both text and binary set both flags:

class ClipEmbeddingModel(llm.EmbeddingModel):
    model_id = "clip"
    supports_binary = True
    supports_text = True      # True by default, shown here for clarity

supports_text defaults to True, so you only need to set it explicitly when your model rejects text entirely. When supports_binary = True, your embed_batch() method may receive a list containing Python bytes objects (for binary items), strings (for text items), or a mix of both if your model supports both types:

def embed_batch(self, items):
    for item in items:
        if isinstance(item, bytes):
            # Process binary content
            ...
        else:
            # Process text content
            ...

llm-clip is a complete working example of a model that embeds both binary and text content.

Setting batch_size

If your embedding API or library has an optimal batch size, declare it as a class attribute. LLM’s embed_multi() method will then automatically split large inputs into chunks of that size before calling embed_batch():

class MyEmbeddingModel(llm.EmbeddingModel):
    model_id = "my-model"
    batch_size = 32

    def embed_batch(self, items):
        # Will be called with at most 32 items at a time
        ...

batch_size = None (the default) passes all items to embed_batch() in a single call.

Real-world examples

llm-sentence-transformers

llm-sentence-transformers is a complete plugin that wraps the sentence-transformers library and supports many different pre-trained models. It demonstrates lazy loading, configurable model selection, and alias registration.

llm-clip

llm-clip embeds both text and images using OpenAI’s CLIP model. It demonstrates the supports_binary = True pattern and embedding images directly from file paths.

llm-embed-jina

Execute Jina embeddings with a CLI using llm-embed-jina is a tutorial that walks through building a plugin for the Jina embeddings API, including API key handling.

Building Plugins

Writing LLM Embedding Model Plugins: A Developer Guide

The EmbeddingModel class

register_embedding_models hook

A complete example plugin

Embedding binary content

Setting batch_size

Real-world examples

Build docs developers (and LLMs) love

Building Plugins

Documentation Index

​The EmbeddingModel class

​register_embedding_models hook

​A complete example plugin

​Embedding binary content

​Setting batch_size

​Real-world examples

Build docs developers (and LLMs) love

The EmbeddingModel class

register_embedding_models hook

A complete example plugin

Embedding binary content

Setting batch_size

Real-world examples