Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/dais-polymtl/flock/llms.txt

Use this file to discover all available pages before exploring further.

Structured output lets you specify exactly which fields an LLM must return and what types they must have. Instead of parsing free-form text after the fact, you embed a JSON schema directly in your SQL query and get back a validated object every time. This works with all six Flock LLM functions: llm_complete, llm_filter, llm_reduce, llm_rerank, llm_first, and llm_last.
To extract fields from structured responses using dot notation (e.g., response.category), load the DuckDB JSON extension first:
LOAD JSON;

Provider implementations

Each supported provider has its own schema syntax. The structure is different, but the effect is the same: the model is constrained to return a JSON object that matches your schema.
OpenAI enforces schema compliance through the response_format parameter with type: "json_schema" and strict: true. The schema is nested inside a json_schema object that also carries a name field used for identification.
SELECT llm_complete(
  {
    'model_name': 'gpt-4o',
    'model_parameters': '{
      "response_format": {
        "type": "json_schema",
        "json_schema": {
          "name": "product_categorization",
          "schema": {
            "type": "object",
            "properties": {
              "category": { "type": "string" },
              "confidence": { "type": "number" }
            },
            "required": ["category", "confidence"],
            "additionalProperties": false
          }
        },
        "strict": true
      }
    }'
  },
  { 'prompt': 'Categorize this product and provide a confidence score.' },
  { 'product_name': 'Wireless Bluetooth Headphones' }
) AS response;
For a full multi-field analysis across a table, use a CTE to capture the structured response before extracting individual fields:
WITH product_analysis AS (
  SELECT
    product_id,
    product_name,
    llm_complete(
      {
        'model_name': 'gpt-4o',
        'model_parameters': '{
          "response_format": {
            "type": "json_schema",
            "json_schema": {
              "name": "product_analysis",
              "schema": {
                "type": "object",
                "properties": {
                  "category": { "type": "string" },
                  "price_range": { "type": "string" },
                  "features": {
                    "type": "array",
                    "items": { "type": "string" }
                  },
                  "target_audience": { "type": "string" },
                  "rating_prediction": { "type": "number" }
                },
                "required": ["category", "price_range", "features", "target_audience", "rating_prediction"],
                "additionalProperties": false
              }
            },
            "strict": true
          }
        }'
      },
      { 'prompt_name': 'product-analysis', 'version': 1 },
      { 'product_name': product_name, 'product_description': product_description }
    ) AS analysis
  FROM products
)
SELECT
  product_id,
  product_name,
  analysis.category::VARCHAR,
  analysis.price_range::VARCHAR,
  analysis.features,
  analysis.target_audience::VARCHAR,
  analysis.rating_prediction::DOUBLE
FROM product_analysis;

Structured output with aggregate functions

The same schema syntax works with aggregate functions like llm_reduce. The model receives all rows in a group and returns a single structured object:
SELECT llm_reduce(
  {
    'model_name': 'gpt-4o',
    'model_parameters': '{
      "response_format": {
        "type": "json_schema",
        "json_schema": {
          "name": "review_aggregation",
          "schema": {
            "type": "object",
            "properties": {
              "overall_summary": { "type": "string" },
              "key_themes": {
                "type": "array",
                "items": { "type": "string" }
              },
              "sentiment_distribution": {
                "type": "object",
                "properties": {
                  "positive": { "type": "number" },
                  "negative": { "type": "number" },
                  "neutral": { "type": "number" }
                }
              }
            },
            "required": ["overall_summary", "key_themes", "sentiment_distribution"],
            "additionalProperties": false
          }
        },
        "strict": true
      }
    }'
  },
  { 'prompt_name': 'review-aggregation' },
  { 'review_text': review_text }
) AS aggregated_analysis
FROM customer_reviews
GROUP BY product_id;

Extracting values with dot notation

After loading the JSON extension, you can access fields in the structured response using dot notation. Cast to the appropriate DuckDB type to use the values downstream:
SELECT
  response.text_field::VARCHAR,
  response.number_field::DOUBLE,
  response.integer_field::INTEGER,
  response.boolean_field::BOOLEAN
FROM results_table;
Array fields are returned as DuckDB lists and can be passed to array functions or unnested with UNNEST().

Common schema patterns

These reusable schema shapes cover the most frequent structured output use cases.
Returns a category label, a confidence score, and an optional reasoning field. Set required to exclude reasoning if you want a leaner response.
{
  "type": "object",
  "properties": {
    "category": { "type": "string" },
    "confidence": { "type": "number" },
    "reasoning": { "type": "string" }
  },
  "required": ["category", "confidence"],
  "additionalProperties": false
}
Returns a free-text summary alongside structured lists of key points and topics.
{
  "type": "object",
  "properties": {
    "summary": { "type": "string" },
    "key_points": {
      "type": "array",
      "items": { "type": "string" }
    },
    "sentiment": { "type": "string" },
    "topics": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["summary", "sentiment"],
  "additionalProperties": false
}
Returns a list of named entities, each with a type and confidence score.
{
  "type": "object",
  "properties": {
    "entities": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "type": { "type": "string" },
          "confidence": { "type": "number" }
        }
      }
    }
  },
  "required": ["entities"],
  "additionalProperties": false
}

Build docs developers (and LLMs) love