Build an LLM Model Plugin: Step-by-Step Python Tutorial

LLM’s plugin system lets you add support for entirely new language models — or anything that generates text — by shipping a Python package with a handful of required pieces. This tutorial builds llm-markov, a plugin that uses a Markov chain to generate words from an input string. Markov chains aren’t technically large language models, but they’re a perfect exercise for learning every layer of the plugin API before applying those skills to a real model provider.

Create the plugin directory

Start by creating a directory named after your plugin and entering it:

mkdir llm-markov
cd llm-markov

Write the initial plugin file

Create llm_markov.py with a minimal model implementation:

import llm

@llm.hookimpl
def register_models(register):
    register(Markov())

class Markov(llm.Model):
    model_id = "markov"

    def execute(self, prompt, stream, response, conversation):
        return ["hello world"]

The register_models() function is called by LLM’s plugin system (thanks to the @hookimpl decorator). It calls register() with an instance of your new model class.

The Markov class extends llm.Model. The model_id attribute is the identifier users will pass to llm -m. The execute() method contains all the generation logic — for now it just returns a static list.

Create pyproject.toml

LLM discovers plugins through entry points. Create pyproject.toml in the same directory:

[project]
name = "llm-markov"
version = "0.1"

[project.entry-points.llm]
markov = "llm_markov"

This minimal configuration tells LLM how to load your plugin module.

Install in editable mode

From inside your llm-markov directory, install the plugin with the -e (editable) flag so live changes to llm_markov.py are reflected immediately:

llm install -e .

You can also pass a path to the directory:

llm install -e path/to/llm-markov

Confirm the installation succeeded:

llm plugins

[
  {
    "name": "llm-markov",
    "hooks": [
      "register_models"
    ],
    "version": "0.1"
  }
]

Test the stub model:

llm -m markov "the cat sat on the mat"

hello world

Building the Markov chain

A Markov chain generates text by building an index of which words follow which other words in a training sentence. For the phrase "the cat sat on the mat" that index looks like this:

{
  "the": ["cat", "mat"],
  "cat": ["sat"],
  "sat": ["on"],
  "on": ["the"]
}

Here is the Python function that builds this table:

def build_markov_table(text):
    words = text.split()
    transitions = {}
    # Loop through all but the last word
    for i in range(len(words) - 1):
        word = words[i]
        next_word = words[i + 1]
        transitions.setdefault(word, []).append(next_word)
    return transitions

To generate output, start from a word, pick a random successor, and repeat. This implementation is a Python generator that yields one word at a time:

def generate(transitions, length, start_word=None):
    all_words = list(transitions.keys())
    next_word = start_word or random.choice(all_words)
    for i in range(length):
        yield next_word
        options = transitions.get(next_word) or all_words
        next_word = random.choice(options)

Words with no registered successors fall back to a random word from the full vocabulary.

Executing the Markov chain

Update llm_markov.py to use the Markov chain logic in execute():

import llm
import random

@llm.hookimpl
def register_models(register):
    register(Markov())

def build_markov_table(text):
    words = text.split()
    transitions = {}
    for i in range(len(words) - 1):
        word = words[i]
        next_word = words[i + 1]
        transitions.setdefault(word, []).append(next_word)
    return transitions

def generate(transitions, length, start_word=None):
    all_words = list(transitions.keys())
    next_word = start_word or random.choice(all_words)
    for i in range(length):
        yield next_word
        options = transitions.get(next_word) or all_words
        next_word = random.choice(options)

class Markov(llm.Model):
    model_id = "markov"

    def execute(self, prompt, stream, response, conversation):
        text = prompt.prompt
        transitions = build_markov_table(text)
        for word in generate(transitions, 20):
            yield word + ' '

prompt.prompt contains the raw text the user supplied. The execute() method is a generator — each yield sends one token to the output stream. Run it:

llm -m markov "the cat sat on the mat"

the mat the cat sat on the cat sat on the mat cat sat on the mat cat sat on

Understanding execute()

The full signature of execute() is:

def execute(self, prompt, stream, response, conversation):

prompt

A Prompt object containing the user’s text (prompt.prompt), an optional system prompt (prompt.system), and any options the user passed (prompt.options).

stream

A boolean indicating whether the model was invoked in streaming mode. You can choose to behave differently based on this flag — for example, skipping per-token delays when stream is False.

response

The Response object being assembled. You can attach additional data to response.response_json (a dict) at any point during execute(), and it will be persisted in the log database.

conversation

The Conversation the prompt belongs to, or None for a one-off prompt. Models that support multi-turn conversations can walk conversation.responses to include prior context.

Prompts and responses are logged

LLM automatically logs every prompt and response to a SQLite database. Inspect the most recent entry with:

llm logs -n 1

[
  {
    "id": "01h52s4yez2bd1qk2deq49wk8h",
    "model": "markov",
    "prompt": "the cat sat on the mat",
    "system": null,
    "prompt_json": null,
    "options_json": {},
    "response": "on the cat sat on the cat sat on the mat cat sat on the cat sat on the cat ",
    "response_json": null,
    "conversation_id": "01h52s4yey7zc5rjmczy3ft75g",
    "duration_ms": 0,
    "datetime_utc": "2023-07-11T15:29:34.685868"
  }
]

You can store extra data in the log by setting response.response_json inside execute():

def execute(self, prompt, stream, response, conversation):
    text = prompt.prompt
    transitions = build_markov_table(text)
    for word in generate(transitions, 20):
        yield word + ' '
    response.response_json = {"transitions": transitions}

Storing the transitions table here is redundant — it can always be re-derived from the input. For larger prompts this can add significant bulk to the log. Use response.response_json for information that isn’t already captured in the prompt.

Adding Options

LLM models accept typed options passed via -o key value on the command line. Options are declared as an inner Options class on your model, extending llm.Options (which uses Pydantic 2 under the hood). Add these imports at the top of llm_markov.py:

from typing import Optional
from pydantic import field_validator, Field
import time

Then define the Options class and update execute():

class Markov(llm.Model):
    model_id = "markov"
    can_stream = True

    class Options(llm.Options):
        length: Optional[int] = Field(
            description="Number of words to generate",
            default=None
        )
        delay: Optional[float] = Field(
            description="Seconds to delay between each token",
            default=None
        )

        @field_validator("length")
        def validate_length(cls, length):
            if length is None:
                return None
            if length < 2:
                raise ValueError("length must be >= 2")
            return length

        @field_validator("delay")
        def validate_delay(cls, delay):
            if delay is None:
                return None
            if not 0 <= delay <= 10:
                raise ValueError("delay must be between 0 and 10")
            return delay

    def execute(self, prompt, stream, response, conversation):
        text = prompt.prompt
        transitions = build_markov_table(text)
        length = prompt.options.length or 20
        for word in generate(transitions, length):
            yield word + ' '
            if prompt.options.delay:
                time.sleep(prompt.options.delay)

Setting can_stream = True on the class tells LLM this model supports streaming. Validation errors surface cleanly to the user:

llm -m markov "the cat sat on the mat" -o length -1

Error: length
  Value error, length must be >= 2

Use the options:

llm -m markov "the cat sat on the mat" \
  -o length 20 -o delay 0.1

Pass --no-stream to gather the full response before printing (the delay still applies while gathering):

llm -m markov "the cat sat on the mat" \
  -o length 20 -o delay 0.1 --no-stream

Options are also stored in the log:

{
  "options_json": {
    "length": 20,
    "delay": 0.1
  }
}

The complete llm_markov.py

Here is the finished plugin file with all features included:

import llm
import random
import time
from typing import Optional
from pydantic import field_validator, Field


@llm.hookimpl
def register_models(register):
    register(Markov())


def build_markov_table(text):
    words = text.split()
    transitions = {}
    # Loop through all but the last word
    for i in range(len(words) - 1):
        word = words[i]
        next_word = words[i + 1]
        transitions.setdefault(word, []).append(next_word)
    return transitions


def generate(transitions, length, start_word=None):
    all_words = list(transitions.keys())
    next_word = start_word or random.choice(all_words)
    for i in range(length):
        yield next_word
        options = transitions.get(next_word) or all_words
        next_word = random.choice(options)


class Markov(llm.Model):
    model_id = "markov"
    can_stream = True

    class Options(llm.Options):
        length: Optional[int] = Field(
            description="Number of words to generate", default=None
        )
        delay: Optional[float] = Field(
            description="Seconds to delay between each token", default=None
        )

        @field_validator("length")
        def validate_length(cls, length):
            if length is None:
                return None
            if length < 2:
                raise ValueError("length must be >= 2")
            return length

        @field_validator("delay")
        def validate_delay(cls, delay):
            if delay is None:
                return None
            if not 0 <= delay <= 10:
                raise ValueError("delay must be between 0 and 10")
            return delay

    def execute(self, prompt, stream, response, conversation):
        text = prompt.prompt
        transitions = build_markov_table(text)
        length = prompt.options.length or 20
        for word in generate(transitions, length):
            yield word + " "
            if prompt.options.delay:
                time.sleep(prompt.options.delay)

Distributing your plugin

Build wheel and sdist packages

Install the build tool and run it in your plugin directory:

python -m pip install build
python -m build

This produces dist/llm-markov-0.1.tar.gz and dist/llm-markov-0.1-py3-none-any.whl. Either can be installed directly:

llm install dist/llm_markov-0.1-py3-none-any.whl

Host the wheel somewhere online and share the URL:

llm install 'https://.../llm_markov-0.1-py3-none-any.whl'

To uninstall during testing:

llm uninstall llm-markov -y

Share via GitHub Gist

GitHub Gists support multiple files and are free to create. Right-click the Download ZIP button on your Gist and copy the link. Users can install directly from that URL:

llm install 'https://gist.github.com/simonw/6e56d48dc2599bffba963cef0db27b6d/archive/cc50c854414cb4deab3e3ab17e7e1e07d45cba0c.zip'

Share via GitHub repository

Publish to PyPI

Once on PyPI, your plugin is installable by name:

llm install llm-markov

First expand pyproject.toml with full metadata:

[project]
name = "llm-markov"
version = "0.1"
description = "Plugin for LLM adding a Markov chain generating model"
readme = "README.md"
authors = [{name = "Simon Willison"}]
license = {text = "Apache-2.0"}
classifiers = [
    "License :: OSI Approved :: Apache Software License"
]
dependencies = [
    "llm"
]
requires-python = ">3.7"

[project.urls]
Homepage = "https://github.com/simonw/llm-markov"
Changelog = "https://github.com/simonw/llm-markov/releases"
Issues = "https://github.com/simonw/llm-markov/issues"

[project.entry-points.llm]
markov = "llm_markov"

Then upload with twine:

python -m pip install twine
python -m twine upload dist/*

Recovery: what to do if the plugin breaks

If a syntax error in your plugin prevents llm itself from starting, you can uninstall the broken plugin by disabling plugin loading first:

LLM_LOAD_PLUGINS='' llm uninstall llm-markov

The LLM_LOAD_PLUGINS environment variable controls which plugins are loaded. Setting it to an empty string skips all plugins, letting llm start cleanly so you can run uninstall.

Building Plugins

Build an LLM Model Plugin: Step-by-Step Python Tutorial

Building the Markov chain

Executing the Markov chain

Understanding execute()

Prompts and responses are logged

Adding Options

The complete llm_markov.py

Distributing your plugin

Recovery: what to do if the plugin breaks

Build docs developers (and LLMs) love

Building Plugins

Documentation Index

​Building the Markov chain

​Executing the Markov chain

​Understanding execute()

​Prompts and responses are logged

​Adding Options

​The complete llm_markov.py

​Distributing your plugin

​Recovery: what to do if the plugin breaks

Build docs developers (and LLMs) love

Building the Markov chain

Executing the Markov chain

Understanding execute()

Prompts and responses are logged

Adding Options

The complete llm_markov.py

Distributing your plugin

Recovery: what to do if the plugin breaks