Hindsight provides comprehensive observability through Prometheus metrics, OpenTelemetry distributed tracing, and pre-built Grafana dashboards. Metrics are always enabled; tracing is opt-in.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/vectorize-io/hindsight/llms.txt
Use this file to discover all available pages before exploring further.
Metrics endpoint
Hindsight exposes Prometheus metrics at/metrics on the API port (default: 8888). No configuration required.
Grafana dashboards
Three pre-built dashboards are available inmonitoring/grafana/dashboards/. Import the JSON files into any Grafana instance.
| Dashboard | Description |
|---|---|
| Hindsight Operations | Operation rates, latency percentiles, per-bank metrics |
| Hindsight LLM Metrics | LLM calls, token usage, latency by scope and provider |
| Hindsight API Service | HTTP requests, error rates, DB pool utilization, process metrics |
Available metrics
Operations
| Metric | Type | Description |
|---|---|---|
hindsight.operation.duration | Histogram | Duration of operations in seconds |
hindsight.operation.total | Counter | Total operations executed |
operation (retain, recall, reflect), bank_id, source (api, reflect, internal), budget (low, mid, high), max_tokens, success
LLM calls
| Metric | Type | Description |
|---|---|---|
hindsight.llm.duration | Histogram | Duration of LLM API calls in seconds |
hindsight.llm.calls.total | Counter | Total LLM API calls |
hindsight.llm.tokens.input | Counter | Input tokens consumed |
hindsight.llm.tokens.output | Counter | Output tokens generated |
provider, model, scope (memory, reflect, consolidation, answer), success, token_bucket
HTTP requests
| Metric | Type | Description |
|---|---|---|
hindsight.http.duration | Histogram | Duration of HTTP requests in seconds |
hindsight.http.requests.total | Counter | Total HTTP requests |
hindsight.http.requests.in_progress | UpDownCounter | Requests currently being processed |
method, endpoint (UUIDs normalized to {id}), status_code, status_class (2xx, 4xx, 5xx)
Database pool
| Metric | Type | Description |
|---|---|---|
hindsight.db.pool.size | Gauge | Current connections in pool |
hindsight.db.pool.idle | Gauge | Idle connections in pool |
hindsight.db.pool.min | Gauge | Minimum pool size |
hindsight.db.pool.max | Gauge | Maximum pool size |
Process
| Metric | Type | Description |
|---|---|---|
hindsight.process.cpu.seconds | Gauge | Process CPU time in seconds (type: user or system) |
hindsight.process.memory.bytes | Gauge | Process memory usage in bytes |
hindsight.process.open_fds | Gauge | Open file descriptors |
hindsight.process.threads | Gauge | Active threads |
Example PromQL queries
Average operation latency by type
Average operation latency by type
P95 LLM latency
P95 LLM latency
LLM calls per minute by provider
LLM calls per minute by provider
Total tokens consumed by model
Total tokens consumed by model
HTTP error rate (5xx)
HTTP error rate (5xx)
P95 HTTP latency
P95 HTTP latency
Database pool utilization
Database pool utilization
Internal vs API recall operations
Internal vs API recall operations
OpenTelemetry tracing
Hindsight supports distributed tracing for memory operations and LLM calls, following OpenTelemetry GenAI semantic conventions v1.37+.Configuration
| Variable | Description | Default |
|---|---|---|
HINDSIGHT_API_OTEL_TRACES_ENABLED | Enable distributed tracing | false |
HINDSIGHT_API_OTEL_EXPORTER_OTLP_ENDPOINT | OTLP endpoint URL | — |
HINDSIGHT_API_OTEL_EXPORTER_OTLP_HEADERS | OTLP exporter headers (key1=value1,key2=value2) | — |
HINDSIGHT_API_OTEL_SERVICE_NAME | Service name for traces | hindsight-api |
HINDSIGHT_API_OTEL_DEPLOYMENT_ENVIRONMENT | Environment name (development, production, etc.) | development |
HINDSIGHT_API_METRICS_INCLUDE_BANK_ID | Include bank_id in metric attributes. Enable only with few banks — high cardinality causes unbounded memory growth. | false |
Span hierarchy
Traces follow operation boundaries. Parent spans represent memory operations; child spans represent individual LLM calls.Supported backends
Hindsight uses standard OTLP HTTP, so any OTLP-compatible backend works:| Backend | Type |
|---|---|
| Grafana LGTM | All-in-one: traces, logs, metrics |
| Langfuse | LLM-focused observability |
| OpenLIT | LLM dashboards, cost tracking |
| DataDog | Commercial APM |
| New Relic | Commercial APM |
| Honeycomb | Commercial observability |
| Pydantic Logfire | Python-focused observability |
Local development monitoring
For local development, use the Grafana LGTM all-in-one stack (Loki, Grafana, Tempo, Mimir):| Service | URL |
|---|---|
| Grafana UI | http://localhost:3000 (anonymous admin access) |
| Tempo (traces) | OTLP HTTP on port 4318, gRPC on port 4317 |
| Mimir (metrics) | Scrapes http://localhost:8888/metrics automatically |
| Loki (logs) | Available for log aggregation |
The local monitoring stack is for development only. For production, deploy Grafana LGTM separately or use a commercial platform (Grafana Cloud, DataDog, New Relic, etc.).
