Use this file to discover all available pages before exploring further.
The LLM Python API lets you run prompts against any installed model directly from Python code — the same models accessible via the CLI are available programmatically. You can stream responses, pass multi-modal attachments, call tools, extract structured data with schemas, and manage API keys, all without leaving Python.
Use llm.get_model() to retrieve a model by its ID or alias. The function is available as soon as you import llm — no separate configuration step is needed.
import llmmodel = llm.get_model("gpt-4o-mini")
Calling llm.get_model() with no argument uses the currently configured default model, which is gpt-4o-mini unless you have changed it. Passing an invalid model ID raises llm.UnknownModelError.
Call model.prompt() to send a prompt to the model. The response uses lazy evaluation — the network call does not happen until you actually consume the response (by calling .text(), iterating it, or casting it to str).
import llmmodel = llm.get_model("gpt-4o-mini")response = model.prompt("Five surprising names for a pet pelican")print(response.text())
Before the response is evaluated, its repr looks like:
<Response prompt='Five surprising names for a pet pelican' text='... not yet done ...'>
Because Response.__str__ returns the text, you can also write:
print(llm.get_model().prompt("Five surprising names for a pet pelican"))
To check which MIME types a model accepts, inspect model.attachment_types:
model = llm.get_model("gpt-4o-mini")print(model.attachment_types)# {'image/gif', 'image/png', 'image/jpeg', 'image/webp'}if "image/jpeg" in model.attachment_types: # safe to send a JPEG ...
Tools are Python functions the model can call as part of generating a response. Define a function with a docstring (LLM uses it as the tool description), then pass it in the tools= list.
import llmdef upper(text: str) -> str: """Convert text to uppercase.""" return text.upper()model = llm.get_model("gpt-4.1-mini")response = model.prompt("Convert panda to upper", tools=[upper])# Inspect what the model wants to calltool_calls = response.tool_calls()# [ToolCall(name='upper', arguments={'text': 'panda'}, tool_call_id='...')]# Execute the calls and get resultstool_results = response.execute_tool_calls()# [ToolResult(name='upper', output='PANDA', tool_call_id='...')]# Get the model's follow-up reply (auto-executes tool calls)follow_up = response.reply()print(follow_up.text())# The word "panda" converted to uppercase is "PANDA".
Every tool call has a guaranteed-unique tool_call_id. LLM synthesizes one of the form tc_01... for providers that do not supply their own, so you can always correlate calls with results.
For an automatic loop that keeps calling tools until the model stops requesting them, use model.chain():
chain_response = model.chain( "Convert panda to upper", tools=[upper],)print(chain_response.text())# The word "panda" converted to uppercase is "PANDA".
Stream the chain output token by token:
for chunk in model.chain("Convert panda to upper", tools=[upper]): print(chunk, end="", flush=True)
Iterate over individual responses inside the chain:
chain = model.chain("Convert panda to upper", tools=[upper])for response in chain.responses(): print(response.prompt) for chunk in response: print(chunk, end="", flush=True)
Pass before_call= and after_call= to model.chain() to run code around each tool invocation. Raise llm.CancelToolCall in before_call to abort a specific call:
import llmfrom typing import Optionaldef upper(text: str) -> str: "Convert text to uppercase." return text.upper()def before_call(tool: Optional[llm.Tool], tool_call: llm.ToolCall): print(f"About to call {tool.name} with {tool_call.arguments}") if tool.name == "upper" and "bad" in repr(tool_call.arguments): raise llm.CancelToolCall("Not allowed on text containing 'bad'")def after_call(tool: llm.Tool, tool_call: llm.ToolCall, tool_result: llm.ToolResult): print(f"{tool.name} returned {tool_result.output}")model = llm.get_model("gpt-4.1-mini")response = model.chain( "Convert panda to upper and badger to upper", tools=[upper], before_call=before_call, after_call=after_call,)print(response.text())
Raise llm.PauseChain inside a tool to stop the chain cleanly — for example, when human approval is required:
import llmdef delete_files(path: str) -> str: if not approval_already_recorded(path): record_approval_request(path) raise llm.PauseChain("waiting for approval to delete " + path) do_delete(path) return "deleted"try: chain_response.text()except llm.PauseChain as pause: print("Paused on", pause.tool_call.name, pause.tool_call.tool_call_id)
Unlike other exceptions, PauseChain does not produce an error tool result — the framework propagates it with pause.tool_call and pause.tool_results (sibling calls that completed) attached.
For tools that share state or configuration, subclass llm.Toolbox. All public methods become tools automatically:
import llmclass Memory(llm.Toolbox): _memory = None def _get_memory(self): if self._memory is None: self._memory = {} return self._memory def set(self, key: str, value: str): "Set something as a key" self._get_memory()[key] = value def get(self, key: str): "Get something from a key" return self._get_memory().get(key) or "" def keys(self): "Return a list of keys" return list(self._get_memory().keys())model = llm.get_model("gpt-4.1-mini")memory = Memory()conversation = model.conversation(tools=[memory])print(conversation.chain("Set name to Simon").text())print(memory._memory) # {'name': 'Simon'}
Use toolbox.add_tool(fn) to register additional tools after construction. Implement prepare() (or prepare_async()) on the class to run setup logic before the toolbox’s first use.
Pass pre-assembled text fragments to the prompt or system prompt using fragments= and system_fragments=:
response = model.prompt( "What do these documents say about dogs?", fragments=[ open("dogs1.txt").read(), open("dogs2.txt").read(), ], system_fragments=[ "You answer questions like Snoopy", ])
Fragments are especially useful for plugin authors who want to tap into LLM’s fragment caching system (e.g. the llm-anthropic plugin uses them for Claude’s prompt caching).
Any model installed via a plugin is available through the same llm.get_model() call:
pip install llm-anthropic
import llmmodel = llm.get_model("claude-3.5-sonnet")# Set the key if you haven't run 'llm keys set claude'model.key = "YOUR_API_KEY_HERE"response = model.prompt("Five surprising names for a pet pelican")print(response.text())
Most model plugins expose the raw provider JSON via response.json():
import llmfrom pprint import pprintmodel = llm.get_model("gpt-4o-mini")response = model.prompt("3 names for an otter")pprint(response.json())
Example output from GPT-4o mini:
{'content': 'Sure! Here are three fun names for an otter:\n\n1. **Splash**\n2. **Bubbles**\n3. **Otto**', 'created': 1739291215, 'finish_reason': 'stop', 'id': 'chatcmpl-AznO31yxgBjZ4zrzBOwJvHEWgdTaf', 'model': 'gpt-4o-mini-2024-07-18', ...}
The JSON structure differs between providers. Code that reads response.json() directly will only work with one specific model provider.
Use the user(), assistant(), and system() helper functions to build multi-turn message histories:
import llmfrom llm import user, assistant, systemmodel = llm.get_model("gpt-4o-mini")response = model.prompt(messages=[ system("You are a helpful pirate."), user("What is the capital of France?"), assistant("Paris, matey."), user("And Germany?"),])print(response.text())
response.stream_events() yields StreamEvent objects for every content block as it arrives — useful for UIs that show reasoning, tool calls, and text side by side:
response.to_dict() serializes a response to a JSON-safe dict; Response.from_dict() rehydrates it. Use response.reply() to continue from a rehydrated response:
import json, llmmodel = llm.get_model("gpt-4o-mini")response = model.prompt("What's 2+2?")print(response.text())payload = json.dumps(response.to_dict())# Save payload wherever you like, then later:rebuilt = llm.Response.from_dict(json.loads(payload))followup = rebuilt.reply("Add 3 to that")print(followup.text())
Use response.on_done(callback) to run a function as soon as all tokens have been received — useful for token accounting, logging, or triggering downstream work:
import llmmodel = llm.get_model("gpt-4o-mini")response = model.prompt("a poem about a hippo")response.on_done(lambda r: print(r.usage()))print(response.text())
The callback signature is def callback(response). The callback is called synchronously at the end of the response stream. For async models, on_done must be awaited — see the Async API page.
import llm# All synchronous modelsfor model in llm.get_models(): print(model.model_id)# All async modelsfor model in llm.get_async_models(): print(model.model_id)
llm.get_models_with_aliases() returns a list of ModelWithAliases objects that pair each model with its registered aliases.
import llm# Create an aliasllm.set_alias("mini", "gpt-4o-mini")# Remove an alias (raises KeyError if it doesn't exist)llm.remove_alias("turbo")
set_default_model / get_default_model
import llm# Set the default model globally (persisted to disk)llm.set_default_model("claude-3.5-sonnet")# Get the current default (returns "gpt-4o-mini" if unset)model_id = llm.get_default_model()# Detect whether a default has actually been configuredif llm.get_default_model(default=None) is None: print("No default has been set")
set_default_model() writes to the LLM configuration folder and affects all programs using LLM on the system, including the llm CLI.