Monitor Hindsight with Prometheus and Grafana

Hindsight provides comprehensive observability through Prometheus metrics, OpenTelemetry distributed tracing, and pre-built Grafana dashboards. Metrics are always enabled; tracing is opt-in.

Metrics endpoint

Hindsight exposes Prometheus metrics at /metrics on the API port (default: 8888). No configuration required.

curl http://localhost:8888/metrics

To scrape metrics with Prometheus:

scrape_configs:
  - job_name: 'hindsight'
    static_configs:
      - targets: ['localhost:8888']

Grafana dashboards

Three pre-built dashboards are available in monitoring/grafana/dashboards/. Import the JSON files into any Grafana instance.

Dashboard	Description
Hindsight Operations	Operation rates, latency percentiles, per-bank metrics
Hindsight LLM Metrics	LLM calls, token usage, latency by scope and provider
Hindsight API Service	HTTP requests, error rates, DB pool utilization, process metrics

Available metrics

Operations

Metric	Type	Description
`hindsight.operation.duration`	Histogram	Duration of operations in seconds
`hindsight.operation.total`	Counter	Total operations executed

Labels: operation (retain, recall, reflect), bank_id, source (api, reflect, internal), budget (low, mid, high), max_tokens, success

LLM calls

Metric	Type	Description
`hindsight.llm.duration`	Histogram	Duration of LLM API calls in seconds
`hindsight.llm.calls.total`	Counter	Total LLM API calls
`hindsight.llm.tokens.input`	Counter	Input tokens consumed
`hindsight.llm.tokens.output`	Counter	Output tokens generated

Labels: provider, model, scope (memory, reflect, consolidation, answer), success, token_bucket

HTTP requests

Metric	Type	Description
`hindsight.http.duration`	Histogram	Duration of HTTP requests in seconds
`hindsight.http.requests.total`	Counter	Total HTTP requests
`hindsight.http.requests.in_progress`	UpDownCounter	Requests currently being processed

Labels: method, endpoint (UUIDs normalized to {id}), status_code, status_class (2xx, 4xx, 5xx)

Database pool

Metric	Type	Description
`hindsight.db.pool.size`	Gauge	Current connections in pool
`hindsight.db.pool.idle`	Gauge	Idle connections in pool
`hindsight.db.pool.min`	Gauge	Minimum pool size
`hindsight.db.pool.max`	Gauge	Maximum pool size

Process

Metric	Type	Description
`hindsight.process.cpu.seconds`	Gauge	Process CPU time in seconds (`type`: `user` or `system`)
`hindsight.process.memory.bytes`	Gauge	Process memory usage in bytes
`hindsight.process.open_fds`	Gauge	Open file descriptors
`hindsight.process.threads`	Gauge	Active threads

Example PromQL queries

Average operation latency by type

rate(hindsight_operation_duration_sum[5m]) / rate(hindsight_operation_duration_count[5m])

P95 LLM latency

histogram_quantile(0.95, rate(hindsight_llm_duration_bucket[5m]))

LLM calls per minute by provider

rate(hindsight_llm_calls_total[1m]) * 60

Total tokens consumed by model

sum by (model) (hindsight_llm_tokens_input_total + hindsight_llm_tokens_output_total)

HTTP error rate (5xx)

sum(rate(hindsight_http_requests_total{status_class="5xx"}[5m])) / sum(rate(hindsight_http_requests_total[5m]))

P95 HTTP latency

histogram_quantile(0.95, sum by (le) (rate(hindsight_http_duration_seconds_bucket[5m])))

Database pool utilization

hindsight_db_pool_size / hindsight_db_pool_max

Internal vs API recall operations

sum by (source) (rate(hindsight_operation_total{operation="recall"}[5m]))

OpenTelemetry tracing

Hindsight supports distributed tracing for memory operations and LLM calls, following OpenTelemetry GenAI semantic conventions v1.37+.

Configuration

Variable	Description	Default
`HINDSIGHT_API_OTEL_TRACES_ENABLED`	Enable distributed tracing	`false`
`HINDSIGHT_API_OTEL_EXPORTER_OTLP_ENDPOINT`	OTLP endpoint URL	—
`HINDSIGHT_API_OTEL_EXPORTER_OTLP_HEADERS`	OTLP exporter headers (`key1=value1,key2=value2`)	—
`HINDSIGHT_API_OTEL_SERVICE_NAME`	Service name for traces	`hindsight-api`
`HINDSIGHT_API_OTEL_DEPLOYMENT_ENVIRONMENT`	Environment name (`development`, `production`, etc.)	`development`
`HINDSIGHT_API_METRICS_INCLUDE_BANK_ID`	Include `bank_id` in metric attributes. Enable only with few banks — high cardinality causes unbounded memory growth.	`false`

export HINDSIGHT_API_OTEL_TRACES_ENABLED=true
export HINDSIGHT_API_OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.openlit.io
export HINDSIGHT_API_OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer olit-xxx"
export HINDSIGHT_API_OTEL_SERVICE_NAME=hindsight-production
export HINDSIGHT_API_OTEL_DEPLOYMENT_ENVIRONMENT=production

Span hierarchy

Traces follow operation boundaries. Parent spans represent memory operations; child spans represent individual LLM calls.

hindsight.retain
hindsight.recall
  ├── hindsight.recall_embedding
  ├── hindsight.recall_retrieval   (semantic, BM25, graph, temporal — parallel)
  ├── hindsight.recall_fusion      (Reciprocal Rank Fusion)
  └── hindsight.recall_rerank
hindsight.reflect
  └── hindsight.reflect_tool_call  (recall, lookup, etc.)
hindsight.consolidation
hindsight.mental_model_refresh

LLM spans follow GenAI semantic conventions and include full prompts and completions as events, plus token usage and model information.

Supported backends

Hindsight uses standard OTLP HTTP, so any OTLP-compatible backend works:

Backend	Type
Grafana LGTM	All-in-one: traces, logs, metrics
Langfuse	LLM-focused observability
OpenLIT	LLM dashboards, cost tracking
DataDog	Commercial APM
New Relic	Commercial APM
Honeycomb	Commercial observability
Pydantic Logfire	Python-focused observability

Local development monitoring

For local development, use the Grafana LGTM all-in-one stack (Loki, Grafana, Tempo, Mimir):

./scripts/dev/start-monitoring.sh

This starts a single Docker container providing:

Service	URL
Grafana UI	http://localhost:3000 (anonymous admin access)
Tempo (traces)	OTLP HTTP on port 4318, gRPC on port 4317
Mimir (metrics)	Scrapes http://localhost:8888/metrics automatically
Loki (logs)	Available for log aggregation

The pre-built dashboards are provisioned automatically. Enable tracing to point at the local stack:

export HINDSIGHT_API_OTEL_TRACES_ENABLED=true
export HINDSIGHT_API_OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

Then open Grafana at http://localhost:3000, navigate to Explore → Tempo to view traces.

The local monitoring stack is for development only. For production, deploy Grafana LGTM separately or use a commercial platform (Grafana Cloud, DataDog, New Relic, etc.).

Get Started

Core Concepts

SDKs & Clients

Integrations

Deployment & Operations

Monitor Hindsight with Prometheus and Grafana

Metrics endpoint

Grafana dashboards

Available metrics

Operations

LLM calls

HTTP requests

Database pool

Process

Example PromQL queries

OpenTelemetry tracing

Configuration

Span hierarchy

Supported backends

Local development monitoring

Build docs developers (and LLMs) love

Get Started

Core Concepts

SDKs & Clients

Integrations

Deployment & Operations

Documentation Index

​Metrics endpoint

​Grafana dashboards

​Available metrics

​Operations

​LLM calls

​HTTP requests

​Database pool

​Process

​Example PromQL queries

​OpenTelemetry tracing

​Configuration

​Span hierarchy

​Supported backends

​Local development monitoring

Build docs developers (and LLMs) love

Metrics endpoint

Grafana dashboards

Available metrics

Operations

LLM calls

HTTP requests

Database pool

Process

Example PromQL queries

OpenTelemetry tracing

Configuration

Span hierarchy

Supported backends

Local development monitoring