LLM Schemas: Extract Structured JSON from Any Text

LLM’s schemas feature lets you define the exact structure of JSON data you want to receive from a model, forcing structured output instead of free-form text. This is useful for extracting entities from documents, building datasets, populating databases, and any task where you need consistently shaped data you can process programmatically. Schemas are supported by OpenAI, Anthropic, Google Gemini, and other models via plugins. To see which models on your system support schemas, run:

llm models --schemas

Tutorial: From Dogs to Databases

Inventing Structured Data

LLMs excel at generating structured test data. Let’s use LLM’s concise schema syntax to invent a dog:

llm --schema 'name, age int, one_sentence_bio' 'invent a cool dog'

Output:

{
  "name": "Ziggy",
  "age": 4,
  "one_sentence_bio": "Ziggy is a hyper-intelligent, bioluminescent dog who loves to perform tricks in the dark and guides his owner home using his glowing fur."
}

The response matches the schema: name and one_sentence_bio are strings, and age is an integer. To generate multiple items at once, use --schema-multi:

llm --schema-multi 'name, age int, one_sentence_bio' 'invent 3 really cool dogs'

Output:

{
  "items": [
    {
      "name": "Echo",
      "age": 3,
      "one_sentence_bio": "Echo is a sleek, silvery-blue Siberian Husky with mesmerizing blue eyes and a talent for mimicking sounds, making him a natural entertainer."
    },
    {
      "name": "Nova",
      "age": 2,
      "one_sentence_bio": "Nova is a vibrant, spotted Dalmatian with an adventurous spirit and a knack for agility courses, always ready to leap into action."
    },
    {
      "name": "Pixel",
      "age": 4,
      "one_sentence_bio": "Pixel is a playful, tech-savvy Poodle with a rainbow-colored coat, known for her ability to interact with smart devices and her love for puzzle toys."
    }
  ]
}

--schema-multi wraps the output in an {"items": [...]} envelope for broad model compatibility.

Extracting People from News Articles

Schemas become especially powerful when extracting structured data from unstructured text. Here’s how to build a reusable people-extractor that works on any news article. Step 1: Define the schema. LLM’s DSL supports newline-separated fields with optional descriptions after a colon:

name: the person's name
organization: who they represent
role: their job title or role
learned: what we learned about them from this story
article_headline: the headline of the story
article_date: the publication date in YYYY-MM-DD

Step 2: Run it against an article. Pipe in stripped HTML using strip-tags:

curl 'https://apnews.com/article/trump-federal-employees-firings-a85d1aaf1088e050d39dcf7e3664bb9f' | \
  uvx strip-tags | \
  llm --schema-multi "
name: the person's name
organization: who they represent
role: their job title or role
learned: what we learned about them from this story
article_headline: the headline of the story
article_date: the publication date in YYYY-MM-DD
" --system 'extract people mentioned in this article'

Output (truncated):

{
  "items": [
    {
      "name": "William Alsup",
      "organization": "U.S. District Court",
      "role": "Judge",
      "learned": "He ruled that the mass firings of probationary employees were likely unlawful.",
      "article_headline": "Judge finds mass firings of federal probationary workers were likely unlawful",
      "article_date": "2025-02-26"
    },
    {
      "name": "Everett Kelley",
      "organization": "American Federation of Government Employees",
      "role": "National President",
      "learned": "He hailed the court's decision as a victory for employees who were illegally fired.",
      "article_headline": "Judge finds mass firings of federal probationary workers were likely unlawful",
      "article_date": "2025-02-26"
    }
  ]
}

Step 3: Find the saved schema ID. LLM automatically logs every schema it uses. View them with:

llm schemas

Output:

- id: 3b7702e71da3dd791d9e17b76c88730e
  summary: |
    {items: [{name, organization, role, learned, article_headline, article_date}]}
  usage: |
    1 time, most recently 2025-02-28T04:50:02.032081+00:00

Step 4: Reuse the schema by ID. Pass the hex ID directly to --schema to run the same schema on a new article:

curl 'https://apnews.com/article/bezos-katy-perry-blue-origin-launch-4a074e534baa664abfa6538159c12987' | \
  uvx strip-tags | \
  llm --schema 3b7702e71da3dd791d9e17b76c88730e \
    --system 'extract people mentioned in this article'

Step 5: Save to a template. Use --save to give your schema+system-prompt combination a name:

llm --schema 3b7702e71da3dd791d9e17b76c88730e \
  --system 'extract people mentioned in this article' \
  --save people

Now run the extractor on any URL with just:

curl https://www.theguardian.com/commentisfree/2025/feb/27/billy-mcfarland-new-fyre-festival-fantasist | \
  strip-tags | llm -t people

Step 6: Extract from images. The schema works on images too — pass an image URL using -a:

llm -t people -a https://static.simonwillison.net/static/2025/onion-zuck.jpg -m gpt-4o

Step 7: Load into a database. Retrieve all logged items for that schema and pipe them into SQLite:

llm logs --schema t:people --data-key items --data-array | \
  sqlite-utils insert data.db people -

View the table:

sqlite-utils rows data.db people -t -c name -c organization -c role

Output:

name             organization        role
---------------  ------------------  -----------------------------------------
Katy Perry       Blue Origin         Singer
Gayle King       Blue Origin         TV Journalist
Lauren Sanchez   Blue Origin         Helicopter Pilot and former TV Journalist
Billy McFarland  Fyre Festival       Organiser
Mark Zuckerberg  Facebook            CEO

Explore the data visually with Datasette:

uvx datasette data.db

Then visit http://127.0.0.1:8001/data/people.

Ways to Specify a Schema

The --schema option accepts several formats:

Inline JSON Schema

Pass a raw JSON schema string directly on the command line.

Concise DSL

Use LLM’s shorthand name, age int, bio syntax.

File path

Point to a .json file on disk containing a JSON schema.

Schema ID

Use the hex ID from llm schemas to reference a previously used schema.

Template reference

Use t:template-name to reference a schema saved in a template.

Examples of all five:

# Inline JSON
llm --schema '{"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}' 'a dog'

# Concise DSL
llm --schema 'name, age int, bio' 'a dog'

# File path
llm --schema dogs.schema.json 'a dog'

# Schema ID
llm --schema 520f7aabb121afd14d0c6c237b39ba2d 'a dog'

# Template reference
llm --schema t:dog 'a dog'

LLM’s Concise Schema DSL

JSON Schema can be verbose to write by hand. LLM includes a concise alternative syntax that covers the most common cases.

Basic syntax

A schema with two string fields:

name, bio

Type annotations

Append the type after the field name, separated by a space. Supported types: str (default), int, float, bool.

name, bio, age int, score float, active bool

Field descriptions

Add a description after a colon. The description acts as a hint to the model about what to put in that field:

name: the person's full name, age int: their age in years, bio: a short bio

Newline-separated fields

For longer schemas — or when descriptions themselves contain commas — switch to newline separation:

name: the person's full name
age int: their age in years
bio: a short bio, no more than three sentences

Preview the generated JSON Schema

Use llm schemas dsl to see what a DSL string expands to:

llm schemas dsl 'name, age int'

Output:

{
  "type": "object",
  "properties": {
    "name": {
      "type": "string"
    },
    "age": {
      "type": "integer"
    }
  },
  "required": [
    "name",
    "age"
  ]
}

DSL in the Python API

The llm.schema_dsl() function converts DSL strings to JSON schema dictionaries:

import llm

schema = llm.schema_dsl("name, age int, bio")
# Returns a JSON schema dict

# Pass multi=True for an items-array schema
multi_schema = llm.schema_dsl("name, age int, bio", multi=True)

Using Full JSON Schema

For complex structures — nested objects, arrays, constraints, optional fields — pass a full JSON Schema directly. The dogs DSL example name, age int, one_sentence_bio as a full JSON schema:

{
  "type": "object",
  "properties": {
    "name": {
      "type": "string"
    },
    "age": {
      "type": "integer"
    },
    "one_sentence_bio": {
      "type": "string"
    }
  },
  "required": [
    "name",
    "age",
    "one_sentence_bio"
  ]
}

Pass it inline:

llm --schema '{
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "age": {"type": "integer"},
    "one_sentence_bio": {"type": "string"}
  },
  "required": ["name", "age", "one_sentence_bio"]
}' 'a surprising dog'

Or save it to a file and reference by path:

llm --schema dogs.schema.json 'a surprising dog'

LLM recommends the top level of a schema be an object, not an array, for maximum compatibility across models. Use {"items": [...]} to return arrays.

Saving Reusable Schemas in Templates

The quickest way to name and reuse a schema is with --save:

llm --schema 'name, age int, one_sentence_bio' --save dog

Then reference it by name:

llm --schema t:dog 'invent a dog'
llm --schema-multi t:dog 'invent three dogs'

View the stored template to see its full YAML representation:

llm templates show dog

Output:

name: dog
schema_object:
    properties:
        name:
            type: string
        age:
            type: integer
        one_sentence_bio:
            type: string
    required:
    - name
    - age
    - one_sentence_bio
    type: object

Multi-Item Extraction: `--schema-multi`

The --schema-multi flag wraps your schema in an {"items": [...]} envelope, instructing the model to return a list of matching objects:

llm --schema-multi 'name, ten_word_bio' 'invent 3 cool dogs'

Output:

{
  "items": [
    {"name": "Bolt", "ten_word_bio": "Lightning-fast border collie, loves frisbee and outdoor adventures."},
    {"name": "Luna", "ten_word_bio": "Mystical husky with mesmerizing blue eyes, enjoys snow and play."},
    {"name": "Ziggy", "ten_word_bio": "Quirky pug who loves belly rubs and quirky outfits."}
  ]
}

Browsing Logged Schema Responses

All JSON produced via schemas is automatically logged to LLM’s SQLite database. The llm logs command has several options for working with this data.

Filter by schema

Use --schema or --schema-multi to filter responses to those that used a specific schema:

llm logs --schema-multi 'name, ten_word_bio' --data

Output (newline-delimited JSON):

{"items": [{"name": "Bolt", "ten_word_bio": "Lightning-fast border collie..."}, ...]}
{"items": [{"name": "Robo", "ten_word_bio": "A cybernetic dog with laser eyes..."}, ...]}

Flatten nested arrays with `--data-key`

Use --data-key items to unwrap the items array and emit one JSON object per line:

llm logs --schema-multi 'name, ten_word_bio' --data-key items

Output:

{"name": "Bolt", "ten_word_bio": "Lightning-fast border collie, loves frisbee and outdoor adventures."}
{"name": "Luna", "ten_word_bio": "Mystical husky with mesmerizing blue eyes, enjoys snow and play."}
{"name": "Ziggy", "ten_word_bio": "Quirky pug who loves belly rubs and quirky outfits."}
{"name": "Robo", "ten_word_bio": "A cybernetic dog with laser eyes and super intelligence."}

Output as a JSON array with `--data-array`

llm logs --schema-multi 'name, ten_word_bio' --data-key items --data-array

Output:

[{"name": "Bolt", "ten_word_bio": "Lightning-fast border collie, loves frisbee and outdoor adventures."},
 {"name": "Luna", "ten_word_bio": "Mystical husky with mesmerizing blue eyes, enjoys snow and play."},
 {"name": "Ziggy", "ten_word_bio": "Quirky pug who loves belly rubs and quirky outfits."},
 {"name": "Robo", "ten_word_bio": "A cybernetic dog with laser eyes and super intelligence."}]

Include source IDs with `--data-ids`

Add --data-ids to include response_id and conversation_id fields in each row:

llm logs --schema-multi 'name, ten_word_bio' --data-key items --data-ids

Output:

{"name": "Nebula", "ten_word_bio": "A cosmic puppy with starry fur...", "response_id": "01jn4dawj8sq0c6t3emf4k5ryx", "conversation_id": "01jn4dawj8sq0c6t3emf4k5ryx"}
{"name": "Echo", "ten_word_bio": "A clever hound with extraordinary hearing...", "response_id": "01jn4dawj8sq0c6t3emf4k5ryx", "conversation_id": "01jn4dawj8sq0c6t3emf4k5ryx"}

Use --id-gt $ID or --id-gte $ID to skip logged schema data before a certain point, useful when processing new records incrementally.

Python API

Pydantic Models

Pass a Pydantic BaseModel subclass to schema= on model.prompt():

import llm, json
from pydantic import BaseModel

class Dog(BaseModel):
    name: str
    age: int

model = llm.get_model("gpt-4o-mini")
response = model.prompt("Describe a nice dog", schema=Dog)
dog = json.loads(response.text())
print(dog)
# {"name": "Buddy", "age": 3}

Raw JSON Schema Dict

Pass a Python dictionary directly:

response = model.prompt("Describe a nice dog", schema={
    "properties": {
        "name": {"title": "Name", "type": "string"},
        "age": {"title": "Age", "type": "integer"},
    },
    "required": ["name", "age"],
    "title": "Dog",
    "type": "object",
})

Using the DSL from Python

Use llm.schema_dsl() to construct schemas from the concise DSL syntax:

import llm

# Single object
response = model.prompt(
    "Describe a nice dog with a surprising name",
    schema=llm.schema_dsl("name, age int, bio")
)
print(response.text())

# Multiple items
response = model.prompt(
    "Describe 3 nice dogs with surprising names",
    schema=llm.schema_dsl("name, age int, bio", multi=True)
)
print(response.text())

Comparing All Three Approaches

from pydantic import BaseModel
import llm, json

class Dog(BaseModel):
    name: str
    age: int
    one_sentence_bio: str

model = llm.get_model("gpt-4o-mini")
response = model.prompt("Invent a dog", schema=Dog)
print(json.loads(response.text()))

Get Started

Using LLM

Advanced Features

Embeddings

Plugins

LLM Schemas: Extract Structured JSON from Any Text

Tutorial: From Dogs to Databases

Inventing Structured Data

Extracting People from News Articles

Ways to Specify a Schema

Inline JSON Schema

Concise DSL

File path

Schema ID

Template reference

LLM’s Concise Schema DSL

Basic syntax

Type annotations

Field descriptions

Newline-separated fields

Preview the generated JSON Schema

DSL in the Python API

Using Full JSON Schema

Saving Reusable Schemas in Templates

Multi-Item Extraction: `--schema-multi`

Browsing Logged Schema Responses

Filter by schema

Flatten nested arrays with `--data-key`

Output as a JSON array with `--data-array`

Include source IDs with `--data-ids`

Python API

Pydantic Models

Raw JSON Schema Dict

Using the DSL from Python

Comparing All Three Approaches

Build docs developers (and LLMs) love

Get Started

Using LLM

Advanced Features

Embeddings

Plugins

Documentation Index

​Tutorial: From Dogs to Databases

​Inventing Structured Data

​Extracting People from News Articles

​Ways to Specify a Schema

Inline JSON Schema

Concise DSL

File path

Schema ID

Template reference

​LLM’s Concise Schema DSL

​Basic syntax

​Type annotations

​Field descriptions

​Newline-separated fields

​Preview the generated JSON Schema

​DSL in the Python API

​Using Full JSON Schema

​Saving Reusable Schemas in Templates

​Multi-Item Extraction: --schema-multi

​Browsing Logged Schema Responses

​Filter by schema

​Flatten nested arrays with --data-key

​Output as a JSON array with --data-array

​Include source IDs with --data-ids

​Python API

​Pydantic Models

​Raw JSON Schema Dict

​Using the DSL from Python

​Comparing All Three Approaches

Build docs developers (and LLMs) love

Tutorial: From Dogs to Databases

Inventing Structured Data

Extracting People from News Articles

Ways to Specify a Schema

LLM’s Concise Schema DSL

Basic syntax

Type annotations

Field descriptions

Newline-separated fields

Preview the generated JSON Schema

DSL in the Python API

Using Full JSON Schema

Saving Reusable Schemas in Templates

Multi-Item Extraction: `--schema-multi`

Browsing Logged Schema Responses

Filter by schema

Flatten nested arrays with `--data-key`

Output as a JSON array with `--data-array`

Include source IDs with `--data-ids`

Python API

Pydantic Models

Raw JSON Schema Dict

Using the DSL from Python

Comparing All Three Approaches