LLM Schemas: Extract Structured JSON from Any Text
LLM schemas extract structured JSON from text and images. Use the concise field DSL, JSON Schema, or Pydantic models from either the CLI or Python API.
Use this file to discover all available pages before exploring further.
LLM’s schemas feature lets you define the exact structure of JSON data you want to receive from a model, forcing structured output instead of free-form text. This is useful for extracting entities from documents, building datasets, populating databases, and any task where you need consistently shaped data you can process programmatically.Schemas are supported by OpenAI, Anthropic, Google Gemini, and other models via plugins. To see which models on your system support schemas, run:
LLMs excel at generating structured test data. Let’s use LLM’s concise schema syntax to invent a dog:
llm --schema 'name, age int, one_sentence_bio' 'invent a cool dog'
Output:
{ "name": "Ziggy", "age": 4, "one_sentence_bio": "Ziggy is a hyper-intelligent, bioluminescent dog who loves to perform tricks in the dark and guides his owner home using his glowing fur."}
The response matches the schema: name and one_sentence_bio are strings, and age is an integer.To generate multiple items at once, use --schema-multi:
{ "items": [ { "name": "Echo", "age": 3, "one_sentence_bio": "Echo is a sleek, silvery-blue Siberian Husky with mesmerizing blue eyes and a talent for mimicking sounds, making him a natural entertainer." }, { "name": "Nova", "age": 2, "one_sentence_bio": "Nova is a vibrant, spotted Dalmatian with an adventurous spirit and a knack for agility courses, always ready to leap into action." }, { "name": "Pixel", "age": 4, "one_sentence_bio": "Pixel is a playful, tech-savvy Poodle with a rainbow-colored coat, known for her ability to interact with smart devices and her love for puzzle toys." } ]}
--schema-multi wraps the output in an {"items": [...]} envelope for broad model compatibility.
Schemas become especially powerful when extracting structured data from unstructured text. Here’s how to build a reusable people-extractor that works on any news article.Step 1: Define the schema. LLM’s DSL supports newline-separated fields with optional descriptions after a colon:
name: the person's nameorganization: who they representrole: their job title or rolelearned: what we learned about them from this storyarticle_headline: the headline of the storyarticle_date: the publication date in YYYY-MM-DD
Step 2: Run it against an article. Pipe in stripped HTML using strip-tags:
curl 'https://apnews.com/article/trump-federal-employees-firings-a85d1aaf1088e050d39dcf7e3664bb9f' | \ uvx strip-tags | \ llm --schema-multi "name: the person's nameorganization: who they representrole: their job title or rolelearned: what we learned about them from this storyarticle_headline: the headline of the storyarticle_date: the publication date in YYYY-MM-DD" --system 'extract people mentioned in this article'
Output (truncated):
{ "items": [ { "name": "William Alsup", "organization": "U.S. District Court", "role": "Judge", "learned": "He ruled that the mass firings of probationary employees were likely unlawful.", "article_headline": "Judge finds mass firings of federal probationary workers were likely unlawful", "article_date": "2025-02-26" }, { "name": "Everett Kelley", "organization": "American Federation of Government Employees", "role": "National President", "learned": "He hailed the court's decision as a victory for employees who were illegally fired.", "article_headline": "Judge finds mass firings of federal probationary workers were likely unlawful", "article_date": "2025-02-26" } ]}
Step 3: Find the saved schema ID. LLM automatically logs every schema it uses. View them with:
llm schemas
Output:
- id: 3b7702e71da3dd791d9e17b76c88730e summary: | {items: [{name, organization, role, learned, article_headline, article_date}]} usage: | 1 time, most recently 2025-02-28T04:50:02.032081+00:00
Step 4: Reuse the schema by ID. Pass the hex ID directly to --schema to run the same schema on a new article:
curl 'https://apnews.com/article/bezos-katy-perry-blue-origin-launch-4a074e534baa664abfa6538159c12987' | \ uvx strip-tags | \ llm --schema 3b7702e71da3dd791d9e17b76c88730e \ --system 'extract people mentioned in this article'
Step 5: Save to a template. Use --save to give your schema+system-prompt combination a name:
llm --schema 3b7702e71da3dd791d9e17b76c88730e \ --system 'extract people mentioned in this article' \ --save people
Now run the extractor on any URL with just:
curl https://www.theguardian.com/commentisfree/2025/feb/27/billy-mcfarland-new-fyre-festival-fantasist | \ strip-tags | llm -t people
Step 6: Extract from images. The schema works on images too — pass an image URL using -a:
llm -t people -a https://static.simonwillison.net/static/2025/onion-zuck.jpg -m gpt-4o
Step 7: Load into a database. Retrieve all logged items for that schema and pipe them into SQLite:
sqlite-utils rows data.db people -t -c name -c organization -c role
Output:
name organization role--------------- ------------------ -----------------------------------------Katy Perry Blue Origin SingerGayle King Blue Origin TV JournalistLauren Sanchez Blue Origin Helicopter Pilot and former TV JournalistBilly McFarland Fyre Festival OrganiserMark Zuckerberg Facebook CEO
The llm.schema_dsl() function converts DSL strings to JSON schema dictionaries:
import llmschema = llm.schema_dsl("name, age int, bio")# Returns a JSON schema dict# Pass multi=True for an items-array schemamulti_schema = llm.schema_dsl("name, age int, bio", multi=True)
For complex structures — nested objects, arrays, constraints, optional fields — pass a full JSON Schema directly.The dogs DSL example name, age int, one_sentence_bio as a full JSON schema:
{"name": "Bolt", "ten_word_bio": "Lightning-fast border collie, loves frisbee and outdoor adventures."}{"name": "Luna", "ten_word_bio": "Mystical husky with mesmerizing blue eyes, enjoys snow and play."}{"name": "Ziggy", "ten_word_bio": "Quirky pug who loves belly rubs and quirky outfits."}{"name": "Robo", "ten_word_bio": "A cybernetic dog with laser eyes and super intelligence."}
[{"name": "Bolt", "ten_word_bio": "Lightning-fast border collie, loves frisbee and outdoor adventures."}, {"name": "Luna", "ten_word_bio": "Mystical husky with mesmerizing blue eyes, enjoys snow and play."}, {"name": "Ziggy", "ten_word_bio": "Quirky pug who loves belly rubs and quirky outfits."}, {"name": "Robo", "ten_word_bio": "A cybernetic dog with laser eyes and super intelligence."}]
Use llm.schema_dsl() to construct schemas from the concise DSL syntax:
import llm# Single objectresponse = model.prompt( "Describe a nice dog with a surprising name", schema=llm.schema_dsl("name, age int, bio"))print(response.text())# Multiple itemsresponse = model.prompt( "Describe 3 nice dogs with surprising names", schema=llm.schema_dsl("name, age int, bio", multi=True))print(response.text())