Scalar functions: llm_complete, llm_filter, llm_embedding

Flock’s scalar functions apply an LLM operation to each row independently, making them a natural fit for SELECT projections and WHERE clauses. Each function takes a model configuration struct and a prompt configuration struct as arguments, then returns a result for every row processed. All three functions share the same context_columns API for passing column data into the model. A context column entry looks like:

{'data': column_ref}                           -- basic text column
{'data': column_ref, 'name': 'alias'}          -- text column with a named alias for prompt interpolation
{'data': image_url_col, 'type': 'image'}       -- image column (llm_complete and llm_filter only)

llm_complete

llm_complete calls an LLM for each row and returns the model’s text response as JSON. Use it to generate, transform, or enrich text — product descriptions, summaries, translations, and more. Return type: JSON

Parameters

Model configuration (first argument)

model_name

string

required

The registered model name to use for generation.

secret_name

string

The DuckDB secret holding the API key for this model. Omit when using a model that does not require authentication (e.g., a local Ollama model).

Prompt configuration (second argument)

prompt

string

An inline prompt string. Use {alias} placeholders to reference named context columns. Mutually exclusive with prompt_name.

prompt_name

string

The name of a pre-configured prompt stored in Flock’s prompt registry. Mutually exclusive with prompt.

version

integer

The version of the named prompt to use. Only valid when prompt_name is set. Defaults to the latest version.

context_columns

array

List of column references passed to the model as context. Each item must include data and may include name and type.

data (required) — the SQL column expression
name (optional) — alias used to reference this column in prompt placeholders
type (optional) — "tabular" (default) or "image"

Examples

SELECT llm_complete(
    {'model_name': 'gpt-4o'},
    {'prompt': 'Explain the purpose of Flock.'}
) AS flock_purpose;

llm_filter

llm_filter evaluates a yes/no question for each row and returns TRUE or FALSE. Place it in a WHERE clause to keep only the rows the model considers a match for your condition. Return type: BOOLEAN

Parameters

Model configuration (first argument)

model_name

string

required

The registered model name to use for evaluation.

secret_name

string

The DuckDB secret holding the API key for this model.

Prompt configuration (second argument)

prompt

string

An inline yes/no question. Use {alias} placeholders to reference named context columns. Mutually exclusive with prompt_name.

prompt_name

string

The name of a pre-configured filter prompt in Flock’s prompt registry. Mutually exclusive with prompt.

version

integer

The version of the named prompt to use. Only valid when prompt_name is set.

context_columns

array

List of column references passed to the model as context. Each item must include data and may include name and type.

data (required) — the SQL column expression
name (optional) — alias used to reference this column in prompt placeholders
type (optional) — "tabular" (default) or "image"

Examples

SELECT *
FROM VALUES
    (1, 'Eco-friendly bamboo toothbrush made from sustainable materials'),
    (2, 'Plastic water bottle made from recycled content'),
    (3, 'Organic cotton t-shirt with natural dyes')
AS t(product_id, product_description)
WHERE llm_filter(
    {'model_name': 'gpt-4o'},
    {
        'prompt': 'Is this product description eco-friendly?',
        'context_columns': [{'data': product_description, 'name': 'description'}]
    }
);

llm_embedding

llm_embedding encodes one or more text columns into a dense vector that captures semantic meaning. The resulting embeddings can be compared with DuckDB’s built-in array_cosine_similarity function for similarity search, clustering, and RAG retrieval. Return type: DOUBLE[] (a list of double-precision floats)

llm_embedding supports text columns only. Passing type: 'image' in context_columns is not supported and will raise an error.

Parameters

Model configuration (first argument)

model_name

string

required

The registered embedding model name (e.g., text-embedding-3-small).

secret_name

string

The DuckDB secret holding the API key for this model.

Prompt configuration (second argument)

context_columns

array

required

List of text column references to embed. Multiple columns are concatenated before being sent to the model. Each item must include data and may include name.

data (required) — the SQL column expression
name (optional) — alias for this column

Examples

SELECT llm_embedding(
    {'model_name': 'text-embedding-3-small', 'secret_name': 'embedding_secret'},
    {'context_columns': [{'data': product_name}, {'data': product_description}]}
) AS product_embedding
FROM VALUES
    ('Wireless Headphones', 'Premium noise-cancelling headphones with 30-hour battery life'),
    ('Gaming Laptop',       'High-performance laptop with RTX graphics and 16GB RAM'),
    ('Smart Watch',         'Fitness tracker with heart rate monitor and GPS')
AS t(product_name, product_description);

Output format

The function returns a JSON array of floating-point numbers representing the embedding vector:

[0.342, -0.564, 0.123, ..., 0.789]

Cast to a typed array before calling distance functions:

embedding_col::DOUBLE[1536]   -- for text-embedding-3-small (1536 dimensions)

Get Started

SQL Functions

Multimodal

Advanced Features

Development

Scalar functions: llm_complete, llm_filter, llm_embedding

llm_complete

Parameters

Examples

llm_filter

Parameters

Examples

llm_embedding

Parameters

Examples

Output format

Build docs developers (and LLMs) love

Get Started

SQL Functions

Multimodal

Advanced Features

Development

Documentation Index

​llm_complete

​Parameters

​Examples

​llm_filter

​Parameters

​Examples

​llm_embedding

​Parameters

​Examples

​Output format

Build docs developers (and LLMs) love

llm_complete

Parameters

Examples

llm_filter

Parameters

Examples

llm_embedding

Parameters

Examples

Output format