Structured output: enforce JSON schemas on LLMs

Structured output lets you specify exactly which fields an LLM must return and what types they must have. Instead of parsing free-form text after the fact, you embed a JSON schema directly in your SQL query and get back a validated object every time. This works with all six Flock LLM functions: llm_complete, llm_filter, llm_reduce, llm_rerank, llm_first, and llm_last.

To extract fields from structured responses using dot notation (e.g., response.category), load the DuckDB JSON extension first:

LOAD JSON;

Provider implementations

Each supported provider has its own schema syntax. The structure is different, but the effect is the same: the model is constrained to return a JSON object that matches your schema.

OpenAI
Ollama
Anthropic / Claude

OpenAI enforces schema compliance through the response_format parameter with type: "json_schema" and strict: true. The schema is nested inside a json_schema object that also carries a name field used for identification.

SELECT llm_complete(
  {
    'model_name': 'gpt-4o',
    'model_parameters': '{
      "response_format": {
        "type": "json_schema",
        "json_schema": {
          "name": "product_categorization",
          "schema": {
            "type": "object",
            "properties": {
              "category": { "type": "string" },
              "confidence": { "type": "number" }
            },
            "required": ["category", "confidence"],
            "additionalProperties": false
          }
        },
        "strict": true
      }
    }'
  },
  { 'prompt': 'Categorize this product and provide a confidence score.' },
  { 'product_name': 'Wireless Bluetooth Headphones' }
) AS response;

For a full multi-field analysis across a table, use a CTE to capture the structured response before extracting individual fields:

WITH product_analysis AS (
  SELECT
    product_id,
    product_name,
    llm_complete(
      {
        'model_name': 'gpt-4o',
        'model_parameters': '{
          "response_format": {
            "type": "json_schema",
            "json_schema": {
              "name": "product_analysis",
              "schema": {
                "type": "object",
                "properties": {
                  "category": { "type": "string" },
                  "price_range": { "type": "string" },
                  "features": {
                    "type": "array",
                    "items": { "type": "string" }
                  },
                  "target_audience": { "type": "string" },
                  "rating_prediction": { "type": "number" }
                },
                "required": ["category", "price_range", "features", "target_audience", "rating_prediction"],
                "additionalProperties": false
              }
            },
            "strict": true
          }
        }'
      },
      { 'prompt_name': 'product-analysis', 'version': 1 },
      { 'product_name': product_name, 'product_description': product_description }
    ) AS analysis
  FROM products
)
SELECT
  product_id,
  product_name,
  analysis.category::VARCHAR,
  analysis.price_range::VARCHAR,
  analysis.features,
  analysis.target_audience::VARCHAR,
  analysis.rating_prediction::DOUBLE
FROM product_analysis;

Ollama uses a format field that takes the object schema directly — no wrapper object or strict flag required.

SELECT llm_complete(
  {
    'model_name': 'llama3.1',
    'model_parameters': '{
      "format": {
        "type": "object",
        "properties": {
          "summary": { "type": "string" },
          "sentiment": { "type": "string" }
        },
        "required": ["summary", "sentiment"],
        "additionalProperties": false
      }
    }'
  },
  { 'prompt': 'Summarize this review and determine its sentiment.' },
  { 'review_text': review_text }
) AS response
FROM customer_reviews;

Multi-field content generation across a table:

WITH content_generation AS (
  SELECT
    topic_id,
    topic_name,
    llm_complete(
      {
        'model_name': 'ollama-writer',
        'model_parameters': '{
          "format": {
            "type": "object",
            "properties": {
              "title": { "type": "string" },
              "content": { "type": "string" },
              "tags": {
                "type": "array",
                "items": { "type": "string" }
              },
              "word_count": { "type": "number" },
              "difficulty_level": { "type": "string" }
            },
            "required": ["title", "content", "tags", "word_count", "difficulty_level"],
            "additionalProperties": false
          }
        }'
      },
      { 'prompt_name': 'content-generator', 'version': 2 },
      { 'topic': topic_name, 'target_audience': target_audience }
    ) AS generated_content
  FROM content_topics
)
SELECT
  topic_id,
  topic_name,
  generated_content.title::VARCHAR,
  generated_content.content::VARCHAR,
  generated_content.tags,
  generated_content.word_count::INTEGER,
  generated_content.difficulty_level::VARCHAR
FROM content_generation;

Anthropic support is split across model generations. Claude 4.x uses an output_format parameter. Claude 3.x relies on the tool_use mechanism to enforce structure. See the Anthropic getting started guide for the full setup and version-specific syntax.

Make sure you are targeting the right Claude generation. Using output_format with a Claude 3.x model, or tool_use with Claude 4.x, will produce unexpected results.

Structured output with aggregate functions

The same schema syntax works with aggregate functions like llm_reduce. The model receives all rows in a group and returns a single structured object:

SELECT llm_reduce(
  {
    'model_name': 'gpt-4o',
    'model_parameters': '{
      "response_format": {
        "type": "json_schema",
        "json_schema": {
          "name": "review_aggregation",
          "schema": {
            "type": "object",
            "properties": {
              "overall_summary": { "type": "string" },
              "key_themes": {
                "type": "array",
                "items": { "type": "string" }
              },
              "sentiment_distribution": {
                "type": "object",
                "properties": {
                  "positive": { "type": "number" },
                  "negative": { "type": "number" },
                  "neutral": { "type": "number" }
                }
              }
            },
            "required": ["overall_summary", "key_themes", "sentiment_distribution"],
            "additionalProperties": false
          }
        },
        "strict": true
      }
    }'
  },
  { 'prompt_name': 'review-aggregation' },
  { 'review_text': review_text }
) AS aggregated_analysis
FROM customer_reviews
GROUP BY product_id;

Extracting values with dot notation

After loading the JSON extension, you can access fields in the structured response using dot notation. Cast to the appropriate DuckDB type to use the values downstream:

SELECT
  response.text_field::VARCHAR,
  response.number_field::DOUBLE,
  response.integer_field::INTEGER,
  response.boolean_field::BOOLEAN
FROM results_table;

Array fields are returned as DuckDB lists and can be passed to array functions or unnested with UNNEST().

Common schema patterns

These reusable schema shapes cover the most frequent structured output use cases.

Classification

Returns a category label, a confidence score, and an optional reasoning field. Set required to exclude reasoning if you want a leaner response.

{
  "type": "object",
  "properties": {
    "category": { "type": "string" },
    "confidence": { "type": "number" },
    "reasoning": { "type": "string" }
  },
  "required": ["category", "confidence"],
  "additionalProperties": false
}

Analysis

Returns a free-text summary alongside structured lists of key points and topics.

{
  "type": "object",
  "properties": {
    "summary": { "type": "string" },
    "key_points": {
      "type": "array",
      "items": { "type": "string" }
    },
    "sentiment": { "type": "string" },
    "topics": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["summary", "sentiment"],
  "additionalProperties": false
}

Entity extraction

Returns a list of named entities, each with a type and confidence score.

{
  "type": "object",
  "properties": {
    "entities": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "type": { "type": "string" },
          "confidence": { "type": "number" }
        }
      }
    }
  },
  "required": ["entities"],
  "additionalProperties": false
}

Get Started

SQL Functions

Multimodal

Advanced Features

Development

Structured output: enforce JSON schemas on LLMs

Provider implementations

Structured output with aggregate functions

Extracting values with dot notation

Common schema patterns

Build docs developers (and LLMs) love

Get Started

SQL Functions

Multimodal

Advanced Features

Development

Documentation Index

​Provider implementations

​Structured output with aggregate functions

​Extracting values with dot notation

​Common schema patterns

Build docs developers (and LLMs) love

Provider implementations

Structured output with aggregate functions

Extracting values with dot notation

Common schema patterns