Flock collects observability data for every LLM call at the database level. You can inspect token usage, API round-trip latency, and total execution time broken down by function, model, and provider — all from SQL, without any external monitoring infrastructure. Metrics are aggregated across both scalar and aggregate function calls, so a singleDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/dais-polymtl/flock/llms.txt
Use this file to discover all available pages before exploring further.
flock_get_metrics() query gives you a complete picture of what a workload consumed.
Core functions
Flock registers three scalar functions for metrics access:flock_get_metrics()
Returns a compact JSON summary of LLM usage since the last reset.
flock_get_debug_metrics()
Returns a more verbose JSON payload useful for diagnosing unexpected behavior.
flock_reset_metrics()
Clears the in-memory metrics state and returns a confirmation string.
Metrics JSON structure
flock_get_metrics() returns a JSON object with a single invocations array. Each element represents one function/model combination that was called:
The Flock function that produced this entry. One of
llm_complete, llm_filter, llm_embedding, llm_reduce, llm_rerank, llm_first, or llm_last.The model name as configured in Flock (e.g.,
gpt-4o, llama3.1). This is the model_name key you pass to the function, not the underlying model identifier.The provider that served the calls. One of
openai, azure, ollama, or anthropic.Total prompt tokens consumed across all calls in this invocation group.
Total completion tokens generated across all calls in this invocation group.
Number of HTTP requests sent to the provider API. This can exceed the row count when batching is in effect.
Cumulative time spent waiting for the provider API to respond, in microseconds. Divide by 1,000 for milliseconds.
Total wall-clock time for the function invocation including serialization, batching, and deserialization, in microseconds. This is always ≥
api_duration_us.Basic workflow
The standard pattern is: reset, run, inspect.Query-level workflow example
Because metrics are stored at the database level, you can reset, run, and inspect in a single script:When to use metrics
Benchmarking
Compare latency and token counts across providers or models running the same prompt to pick the most cost-effective option.
Cost monitoring
Track cumulative token usage across workloads to stay within API quotas and budget limits.
Prompt optimization
Measure how prompt rewrites affect input token counts and API latency before rolling changes to production.
Query diagnosis
Use
flock_get_debug_metrics() to identify which specific calls inside a complex query are slow or unexpectedly expensive.