LLM’s plugin system lets you add support for entirely new language models — or anything that generates text — by shipping a Python package with a handful of required pieces. This tutorial buildsDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/simonw/LLM/llms.txt
Use this file to discover all available pages before exploring further.
llm-markov, a plugin that uses a Markov chain to generate words from an input string. Markov chains aren’t technically large language models, but they’re a perfect exercise for learning every layer of the plugin API before applying those skills to a real model provider.
import llm
@llm.hookimpl
def register_models(register):
register(Markov())
class Markov(llm.Model):
model_id = "markov"
def execute(self, prompt, stream, response, conversation):
return ["hello world"]
The
register_models() function is called by LLM’s plugin system (thanks to the @hookimpl decorator). It calls register() with an instance of your new model class.The
Markov class extends llm.Model. The model_id attribute is the identifier users will pass to llm -m. The execute() method contains all the generation logic — for now it just returns a static list.LLM discovers plugins through entry points. Create
pyproject.toml in the same directory:From inside your
llm-markov directory, install the plugin with the -e (editable) flag so live changes to llm_markov.py are reflected immediately:Building the Markov chain
A Markov chain generates text by building an index of which words follow which other words in a training sentence. For the phrase"the cat sat on the mat" that index looks like this:
yields one word at a time:
Executing the Markov chain
Updatellm_markov.py to use the Markov chain logic in execute():
prompt.prompt contains the raw text the user supplied. The execute() method is a generator — each yield sends one token to the output stream.
Run it:
Understanding execute()
The full signature ofexecute() is:
prompt
prompt
A
Prompt object containing the user’s text (prompt.prompt), an optional system prompt (prompt.system), and any options the user passed (prompt.options).stream
stream
A boolean indicating whether the model was invoked in streaming mode. You can choose to behave differently based on this flag — for example, skipping per-token delays when
stream is False.response
response
The
Response object being assembled. You can attach additional data to response.response_json (a dict) at any point during execute(), and it will be persisted in the log database.conversation
conversation
The
Conversation the prompt belongs to, or None for a one-off prompt. Models that support multi-turn conversations can walk conversation.responses to include prior context.Prompts and responses are logged
LLM automatically logs every prompt and response to a SQLite database. Inspect the most recent entry with:response.response_json inside execute():
Storing the
transitions table here is redundant — it can always be re-derived from the input. For larger prompts this can add significant bulk to the log. Use response.response_json for information that isn’t already captured in the prompt.Adding Options
LLM models accept typed options passed via-o key value on the command line. Options are declared as an inner Options class on your model, extending llm.Options (which uses Pydantic 2 under the hood).
Add these imports at the top of llm_markov.py:
Options class and update execute():
can_stream = True on the class tells LLM this model supports streaming. Validation errors surface cleanly to the user:
--no-stream to gather the full response before printing (the delay still applies while gathering):
The complete llm_markov.py
Here is the finished plugin file with all features included:Distributing your plugin
Build wheel and sdist packages
Build wheel and sdist packages
Install the This produces Host the wheel somewhere online and share the URL:To uninstall during testing:
build tool and run it in your plugin directory:dist/llm-markov-0.1.tar.gz and dist/llm-markov-0.1-py3-none-any.whl. Either can be installed directly:Share via GitHub Gist
Share via GitHub Gist
Share via GitHub repository
Share via GitHub repository
Publish to PyPI
Publish to PyPI
Once on PyPI, your plugin is installable by name:First expand Then upload with
pyproject.toml with full metadata:twine:Recovery: what to do if the plugin breaks
If a syntax error in your plugin preventsllm itself from starting, you can uninstall the broken plugin by disabling plugin loading first:
LLM_LOAD_PLUGINS environment variable controls which plugins are loaded. Setting it to an empty string skips all plugins, letting llm start cleanly so you can run uninstall.