Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vectorize-io/hindsight/llms.txt

Use this file to discover all available pages before exploring further.

Hindsight provides comprehensive observability through Prometheus metrics, OpenTelemetry distributed tracing, and pre-built Grafana dashboards. Metrics are always enabled; tracing is opt-in.

Metrics endpoint

Hindsight exposes Prometheus metrics at /metrics on the API port (default: 8888). No configuration required.
curl http://localhost:8888/metrics
To scrape metrics with Prometheus:
scrape_configs:
  - job_name: 'hindsight'
    static_configs:
      - targets: ['localhost:8888']

Grafana dashboards

Three pre-built dashboards are available in monitoring/grafana/dashboards/. Import the JSON files into any Grafana instance.
DashboardDescription
Hindsight OperationsOperation rates, latency percentiles, per-bank metrics
Hindsight LLM MetricsLLM calls, token usage, latency by scope and provider
Hindsight API ServiceHTTP requests, error rates, DB pool utilization, process metrics

Available metrics

Operations

MetricTypeDescription
hindsight.operation.durationHistogramDuration of operations in seconds
hindsight.operation.totalCounterTotal operations executed
Labels: operation (retain, recall, reflect), bank_id, source (api, reflect, internal), budget (low, mid, high), max_tokens, success

LLM calls

MetricTypeDescription
hindsight.llm.durationHistogramDuration of LLM API calls in seconds
hindsight.llm.calls.totalCounterTotal LLM API calls
hindsight.llm.tokens.inputCounterInput tokens consumed
hindsight.llm.tokens.outputCounterOutput tokens generated
Labels: provider, model, scope (memory, reflect, consolidation, answer), success, token_bucket

HTTP requests

MetricTypeDescription
hindsight.http.durationHistogramDuration of HTTP requests in seconds
hindsight.http.requests.totalCounterTotal HTTP requests
hindsight.http.requests.in_progressUpDownCounterRequests currently being processed
Labels: method, endpoint (UUIDs normalized to {id}), status_code, status_class (2xx, 4xx, 5xx)

Database pool

MetricTypeDescription
hindsight.db.pool.sizeGaugeCurrent connections in pool
hindsight.db.pool.idleGaugeIdle connections in pool
hindsight.db.pool.minGaugeMinimum pool size
hindsight.db.pool.maxGaugeMaximum pool size

Process

MetricTypeDescription
hindsight.process.cpu.secondsGaugeProcess CPU time in seconds (type: user or system)
hindsight.process.memory.bytesGaugeProcess memory usage in bytes
hindsight.process.open_fdsGaugeOpen file descriptors
hindsight.process.threadsGaugeActive threads

Example PromQL queries

rate(hindsight_operation_duration_sum[5m]) / rate(hindsight_operation_duration_count[5m])
histogram_quantile(0.95, rate(hindsight_llm_duration_bucket[5m]))
rate(hindsight_llm_calls_total[1m]) * 60
sum by (model) (hindsight_llm_tokens_input_total + hindsight_llm_tokens_output_total)
sum(rate(hindsight_http_requests_total{status_class="5xx"}[5m])) / sum(rate(hindsight_http_requests_total[5m]))
histogram_quantile(0.95, sum by (le) (rate(hindsight_http_duration_seconds_bucket[5m])))
hindsight_db_pool_size / hindsight_db_pool_max
sum by (source) (rate(hindsight_operation_total{operation="recall"}[5m]))

OpenTelemetry tracing

Hindsight supports distributed tracing for memory operations and LLM calls, following OpenTelemetry GenAI semantic conventions v1.37+.

Configuration

VariableDescriptionDefault
HINDSIGHT_API_OTEL_TRACES_ENABLEDEnable distributed tracingfalse
HINDSIGHT_API_OTEL_EXPORTER_OTLP_ENDPOINTOTLP endpoint URL
HINDSIGHT_API_OTEL_EXPORTER_OTLP_HEADERSOTLP exporter headers (key1=value1,key2=value2)
HINDSIGHT_API_OTEL_SERVICE_NAMEService name for traceshindsight-api
HINDSIGHT_API_OTEL_DEPLOYMENT_ENVIRONMENTEnvironment name (development, production, etc.)development
HINDSIGHT_API_METRICS_INCLUDE_BANK_IDInclude bank_id in metric attributes. Enable only with few banks — high cardinality causes unbounded memory growth.false
export HINDSIGHT_API_OTEL_TRACES_ENABLED=true
export HINDSIGHT_API_OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.openlit.io
export HINDSIGHT_API_OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer olit-xxx"
export HINDSIGHT_API_OTEL_SERVICE_NAME=hindsight-production
export HINDSIGHT_API_OTEL_DEPLOYMENT_ENVIRONMENT=production

Span hierarchy

Traces follow operation boundaries. Parent spans represent memory operations; child spans represent individual LLM calls.
hindsight.retain
hindsight.recall
  ├── hindsight.recall_embedding
  ├── hindsight.recall_retrieval   (semantic, BM25, graph, temporal — parallel)
  ├── hindsight.recall_fusion      (Reciprocal Rank Fusion)
  └── hindsight.recall_rerank
hindsight.reflect
  └── hindsight.reflect_tool_call  (recall, lookup, etc.)
hindsight.consolidation
hindsight.mental_model_refresh
LLM spans follow GenAI semantic conventions and include full prompts and completions as events, plus token usage and model information.

Supported backends

Hindsight uses standard OTLP HTTP, so any OTLP-compatible backend works:
BackendType
Grafana LGTMAll-in-one: traces, logs, metrics
LangfuseLLM-focused observability
OpenLITLLM dashboards, cost tracking
DataDogCommercial APM
New RelicCommercial APM
HoneycombCommercial observability
Pydantic LogfirePython-focused observability

Local development monitoring

For local development, use the Grafana LGTM all-in-one stack (Loki, Grafana, Tempo, Mimir):
./scripts/dev/start-monitoring.sh
This starts a single Docker container providing:
ServiceURL
Grafana UIhttp://localhost:3000 (anonymous admin access)
Tempo (traces)OTLP HTTP on port 4318, gRPC on port 4317
Mimir (metrics)Scrapes http://localhost:8888/metrics automatically
Loki (logs)Available for log aggregation
The pre-built dashboards are provisioned automatically. Enable tracing to point at the local stack:
export HINDSIGHT_API_OTEL_TRACES_ENABLED=true
export HINDSIGHT_API_OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
Then open Grafana at http://localhost:3000, navigate to Explore → Tempo to view traces.
The local monitoring stack is for development only. For production, deploy Grafana LGTM separately or use a commercial platform (Grafana Cloud, DataDog, New Relic, etc.).

Build docs developers (and LLMs) love