Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/dais-polymtl/flock/llms.txt

Use this file to discover all available pages before exploring further.

Flock’s scalar functions apply an LLM operation to each row independently, making them a natural fit for SELECT projections and WHERE clauses. Each function takes a model configuration struct and a prompt configuration struct as arguments, then returns a result for every row processed. All three functions share the same context_columns API for passing column data into the model. A context column entry looks like:
{'data': column_ref}                           -- basic text column
{'data': column_ref, 'name': 'alias'}          -- text column with a named alias for prompt interpolation
{'data': image_url_col, 'type': 'image'}       -- image column (llm_complete and llm_filter only)

llm_complete

llm_complete calls an LLM for each row and returns the model’s text response as JSON. Use it to generate, transform, or enrich text — product descriptions, summaries, translations, and more. Return type: JSON

Parameters

Model configuration (first argument)
model_name
string
required
The registered model name to use for generation.
secret_name
string
The DuckDB secret holding the API key for this model. Omit when using a model that does not require authentication (e.g., a local Ollama model).
Prompt configuration (second argument)
prompt
string
An inline prompt string. Use {alias} placeholders to reference named context columns. Mutually exclusive with prompt_name.
prompt_name
string
The name of a pre-configured prompt stored in Flock’s prompt registry. Mutually exclusive with prompt.
version
integer
The version of the named prompt to use. Only valid when prompt_name is set. Defaults to the latest version.
context_columns
array
List of column references passed to the model as context. Each item must include data and may include name and type.
  • data (required) — the SQL column expression
  • name (optional) — alias used to reference this column in prompt placeholders
  • type (optional)"tabular" (default) or "image"

Examples

SELECT llm_complete(
    {'model_name': 'gpt-4o'},
    {'prompt': 'Explain the purpose of Flock.'}
) AS flock_purpose;

llm_filter

llm_filter evaluates a yes/no question for each row and returns TRUE or FALSE. Place it in a WHERE clause to keep only the rows the model considers a match for your condition. Return type: BOOLEAN

Parameters

Model configuration (first argument)
model_name
string
required
The registered model name to use for evaluation.
secret_name
string
The DuckDB secret holding the API key for this model.
Prompt configuration (second argument)
prompt
string
An inline yes/no question. Use {alias} placeholders to reference named context columns. Mutually exclusive with prompt_name.
prompt_name
string
The name of a pre-configured filter prompt in Flock’s prompt registry. Mutually exclusive with prompt.
version
integer
The version of the named prompt to use. Only valid when prompt_name is set.
context_columns
array
List of column references passed to the model as context. Each item must include data and may include name and type.
  • data (required) — the SQL column expression
  • name (optional) — alias used to reference this column in prompt placeholders
  • type (optional)"tabular" (default) or "image"

Examples

SELECT *
FROM VALUES
    (1, 'Eco-friendly bamboo toothbrush made from sustainable materials'),
    (2, 'Plastic water bottle made from recycled content'),
    (3, 'Organic cotton t-shirt with natural dyes')
AS t(product_id, product_description)
WHERE llm_filter(
    {'model_name': 'gpt-4o'},
    {
        'prompt': 'Is this product description eco-friendly?',
        'context_columns': [{'data': product_description, 'name': 'description'}]
    }
);

llm_embedding

llm_embedding encodes one or more text columns into a dense vector that captures semantic meaning. The resulting embeddings can be compared with DuckDB’s built-in array_cosine_similarity function for similarity search, clustering, and RAG retrieval. Return type: DOUBLE[] (a list of double-precision floats)
llm_embedding supports text columns only. Passing type: 'image' in context_columns is not supported and will raise an error.

Parameters

Model configuration (first argument)
model_name
string
required
The registered embedding model name (e.g., text-embedding-3-small).
secret_name
string
The DuckDB secret holding the API key for this model.
Prompt configuration (second argument)
context_columns
array
required
List of text column references to embed. Multiple columns are concatenated before being sent to the model. Each item must include data and may include name.
  • data (required) — the SQL column expression
  • name (optional) — alias for this column

Examples

SELECT llm_embedding(
    {'model_name': 'text-embedding-3-small', 'secret_name': 'embedding_secret'},
    {'context_columns': [{'data': product_name}, {'data': product_description}]}
) AS product_embedding
FROM VALUES
    ('Wireless Headphones', 'Premium noise-cancelling headphones with 30-hour battery life'),
    ('Gaming Laptop',       'High-performance laptop with RTX graphics and 16GB RAM'),
    ('Smart Watch',         'Fitness tracker with heart rate monitor and GPS')
AS t(product_name, product_description);

Output format

The function returns a JSON array of floating-point numbers representing the embedding vector:
[0.342, -0.564, 0.123, ..., 0.789]
Cast to a typed array before calling distance functions:
embedding_col::DOUBLE[1536]   -- for text-embedding-3-small (1536 dimensions)

Build docs developers (and LLMs) love