Configure Hindsight with environment variables

Hindsight is configured entirely through environment variables. There are two services, each with its own prefix: the API service handles all memory operations and uses HINDSIGHT_API_* variables, while the Control Plane (web UI) uses HINDSIGHT_CP_* variables.

Service	Prefix	Description
API Service	`HINDSIGHT_API_*`	Core memory engine (retain, recall, reflect)
Control Plane	`HINDSIGHT_CP_*`	Web UI for managing memory banks

Database

Variable	Description	Default
`HINDSIGHT_API_DATABASE_URL`	PostgreSQL connection string	`pg0` (embedded)
`HINDSIGHT_API_READ_DATABASE_URL`	Optional read-replica URL. Recall queries are routed through a separate pool against this URL, offloading the primary.	Unset (uses primary)
`HINDSIGHT_API_MIGRATION_DATABASE_URL`	Direct URL for running migrations, bypassing connection poolers (e.g. PgBouncer).	Falls back to `DATABASE_URL`
`HINDSIGHT_API_DATABASE_SCHEMA`	PostgreSQL schema name for tables	`public`
`HINDSIGHT_API_RUN_MIGRATIONS_ON_STARTUP`	Run database migrations on API startup	`true`

When no DATABASE_URL is set, Hindsight starts an embedded pg0 instance — convenient for development, but not recommended for production. DATABASE_SCHEMA is useful for multi-database setups, hosting platforms like Supabase where the public schema is shared, or organizational naming conventions. Migrations automatically create the schema if it doesn’t exist.

export HINDSIGHT_API_DATABASE_URL=postgresql://user:pass@host:5432/dbname
export HINDSIGHT_API_DATABASE_SCHEMA=hindsight

Connection pool

Variable	Description	Default
`HINDSIGHT_API_DB_POOL_MIN_SIZE`	Minimum connections in the primary pool	`5`
`HINDSIGHT_API_DB_POOL_MAX_SIZE`	Maximum connections in the primary pool	`100`
`HINDSIGHT_API_READ_DB_POOL_MIN_SIZE`	Minimum connections in the read-replica pool	Falls back to `DB_POOL_MIN_SIZE`
`HINDSIGHT_API_READ_DB_POOL_MAX_SIZE`	Maximum connections in the read-replica pool	Falls back to `DB_POOL_MAX_SIZE`
`HINDSIGHT_API_DB_COMMAND_TIMEOUT`	PostgreSQL command timeout in seconds (client-side)	`60`
`HINDSIGHT_API_DB_ACQUIRE_TIMEOUT`	Connection acquisition timeout in seconds	`30`
`HINDSIGHT_API_DB_STATEMENT_TIMEOUT`	Server-side `statement_timeout` for every pool connection, in seconds. Set to `0` to disable.	`600`

For high-concurrency workloads, increase DB_POOL_MAX_SIZE. Each concurrent recall or reflect operation can use 2–4 connections. To run migrations manually before starting the API:

# Migrate the base schema plus all discovered tenant schemas
hindsight-admin run-db-migration

# Or migrate a specific schema only
hindsight-admin run-db-migration --schema tenant_acme

LLM provider

Variable	Description	Default
`HINDSIGHT_API_LLM_PROVIDER`	Provider name (see examples below)	`openai`
`HINDSIGHT_API_LLM_API_KEY`	API key for the LLM provider	—
`HINDSIGHT_API_LLM_MODEL`	Model name	`gpt-5-mini`
`HINDSIGHT_API_LLM_BASE_URL`	Custom LLM endpoint URL	Provider default
`HINDSIGHT_API_LLM_MAX_CONCURRENT`	Max concurrent LLM requests	`32`
`HINDSIGHT_API_LLM_MAX_RETRIES`	Max retry attempts for LLM API calls	`3`
`HINDSIGHT_API_LLM_INITIAL_BACKOFF`	Initial retry backoff in seconds (exponential)	`1.0`
`HINDSIGHT_API_LLM_MAX_BACKOFF`	Max retry backoff cap in seconds	`60.0`
`HINDSIGHT_API_LLM_TIMEOUT`	LLM request timeout in seconds	`120`

Supported provider values: openai, openai-codex, claude-code, anthropic, gemini, groq, minimax, deepseek, zai, opencode-go, ollama, lmstudio, llamacpp, vertexai, bedrock, litellm, litellmrouter, volcano, openrouter, none.

export HINDSIGHT_API_LLM_PROVIDER=groq
export HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=openai/gpt-oss-20b

Built-in llama.cpp

The llamacpp provider runs a llama.cpp server as a managed subprocess — no external LLM server needed. On first run it auto-downloads a default GGUF model (~3.5 GB). Requires pip install 'hindsight-api-slim[local-llm]'.

Variable	Description	Default
`HINDSIGHT_API_LLAMACPP_MODEL_PATH`	Path to a GGUF file. If unset, auto-downloads `gemma-4-E2B-it-Q4_K_M`.	Auto-download
`HINDSIGHT_API_LLAMACPP_GPU_LAYERS`	Layers to offload to GPU. `-1` = all (recommended), `0` = CPU only.	`-1`
`HINDSIGHT_API_LLAMACPP_CONTEXT_SIZE`	Context window size in tokens	`8192`
`HINDSIGHT_API_LLAMACPP_NO_GRAMMAR`	Disable JSON grammar enforcement (faster, less reliable output)	`false`
`HINDSIGHT_API_LLAMACPP_EXTRA_ARGS`	Extra CLI args passed to the llama.cpp server	—

Per-operation LLM configuration

Different operations have different requirements. Retain (fact extraction) benefits from models with strong structured output; Reflect can use lighter, faster models. Override the default LLM for each operation independently.

Retain: use models with strong structured output (e.g., GPT-4o, Claude) for accurate fact extraction
Reflect: use faster/cheaper models (e.g., GPT-4o-mini, Groq) for generation
Recall: does not use an LLM — no override needed

Variable	Description	Default
`HINDSIGHT_API_RETAIN_LLM_PROVIDER`	LLM provider for retain	Falls back to `LLM_PROVIDER`
`HINDSIGHT_API_RETAIN_LLM_API_KEY`	API key for retain LLM	Falls back to `LLM_API_KEY`
`HINDSIGHT_API_RETAIN_LLM_MODEL`	Model for retain	Falls back to `LLM_MODEL`
`HINDSIGHT_API_RETAIN_LLM_BASE_URL`	Base URL for retain LLM	Falls back to `LLM_BASE_URL`
`HINDSIGHT_API_RETAIN_LLM_MAX_CONCURRENT`	Max concurrent requests for retain	Falls back to `LLM_MAX_CONCURRENT`
`HINDSIGHT_API_RETAIN_LLM_MAX_RETRIES`	Max retries for retain	Falls back to `LLM_MAX_RETRIES`
`HINDSIGHT_API_RETAIN_LLM_TIMEOUT`	Timeout for retain requests (seconds)	Falls back to `LLM_TIMEOUT`
`HINDSIGHT_API_REFLECT_LLM_PROVIDER`	LLM provider for reflect	Falls back to `LLM_PROVIDER`
`HINDSIGHT_API_REFLECT_LLM_API_KEY`	API key for reflect LLM	Falls back to `LLM_API_KEY`
`HINDSIGHT_API_REFLECT_LLM_MODEL`	Model for reflect	Falls back to `LLM_MODEL`
`HINDSIGHT_API_REFLECT_LLM_BASE_URL`	Base URL for reflect LLM	Falls back to `LLM_BASE_URL`
`HINDSIGHT_API_REFLECT_LLM_MAX_CONCURRENT`	Max concurrent requests for reflect	Falls back to `LLM_MAX_CONCURRENT`
`HINDSIGHT_API_REFLECT_LLM_TIMEOUT`	Timeout for reflect requests (seconds)	Falls back to `LLM_TIMEOUT`
`HINDSIGHT_API_CONSOLIDATION_LLM_PROVIDER`	LLM provider for observation consolidation	Falls back to `LLM_PROVIDER`
`HINDSIGHT_API_CONSOLIDATION_LLM_MODEL`	Model for consolidation	Falls back to `LLM_MODEL`
`HINDSIGHT_API_CONSOLIDATION_LLM_MAX_CONCURRENT`	Max concurrent requests for consolidation	Falls back to `LLM_MAX_CONCURRENT`

Example: separate models for retain and reflect

# Default LLM
export HINDSIGHT_API_LLM_PROVIDER=openai
export HINDSIGHT_API_LLM_API_KEY=sk-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gpt-4o

# Strong model for structured extraction
export HINDSIGHT_API_RETAIN_LLM_MODEL=gpt-4o

# Faster model for generation
export HINDSIGHT_API_REFLECT_LLM_PROVIDER=groq
export HINDSIGHT_API_REFLECT_LLM_API_KEY=gsk_xxxxxxxxxxxx
export HINDSIGHT_API_REFLECT_LLM_MODEL=llama-3.3-70b-versatile

Example: tuning retries for rate-limited APIs

export HINDSIGHT_API_LLM_PROVIDER=anthropic
export HINDSIGHT_API_LLM_API_KEY=sk-ant-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=claude-sonnet-4-20250514

# Reduce concurrency to stay within rate limits
export HINDSIGHT_API_RETAIN_LLM_MAX_CONCURRENT=3
export HINDSIGHT_API_RETAIN_LLM_INITIAL_BACKOFF=2.0
export HINDSIGHT_API_RETAIN_LLM_MAX_BACKOFF=120.0

Embeddings

Embedding variable names include a provider segment: HINDSIGHT_API_EMBEDDINGS_{PROVIDER}_{PARAMETER}. For example, when using openai, the model var is HINDSIGHT_API_EMBEDDINGS_OPENAI_MODEL — not HINDSIGHT_API_EMBEDDINGS_MODEL. Misnamed keys cause Hindsight to fall back to default OpenAI settings and fail with auth errors.

Variable	Description	Default
`HINDSIGHT_API_EMBEDDINGS_PROVIDER`	Provider: `local`, `tei`, `openai`, `openrouter`, `cohere`, `google`, `litellm`, `litellm-sdk`	`local`
`HINDSIGHT_API_EMBEDDINGS_LOCAL_MODEL`	Model for local provider	`BAAI/bge-small-en-v1.5`
`HINDSIGHT_API_EMBEDDINGS_LOCAL_FORCE_CPU`	Force CPU mode (avoids MPS/XPC issues on macOS)	`false`
`HINDSIGHT_API_EMBEDDINGS_TEI_URL`	TEI server URL	—
`HINDSIGHT_API_EMBEDDINGS_OPENAI_API_KEY`	OpenAI API key (falls back to `LLM_API_KEY`)	—
`HINDSIGHT_API_EMBEDDINGS_OPENAI_MODEL`	OpenAI embedding model	`text-embedding-3-small`
`HINDSIGHT_API_EMBEDDINGS_OPENAI_BASE_URL`	Custom base URL (e.g., Azure OpenAI)	—
`HINDSIGHT_API_EMBEDDINGS_COHERE_API_KEY`	Cohere API key	—
`HINDSIGHT_API_EMBEDDINGS_COHERE_MODEL`	Cohere embedding model	`embed-english-v3.0`
`HINDSIGHT_API_EMBEDDINGS_GEMINI_API_KEY`	Gemini API key (falls back to `LLM_API_KEY`)	—
`HINDSIGHT_API_EMBEDDINGS_GEMINI_MODEL`	Gemini embedding model	`gemini-embedding-001`
`HINDSIGHT_API_EMBEDDINGS_GEMINI_OUTPUT_DIMENSIONALITY`	Output embedding dimensions for Gemini	`768`
`HINDSIGHT_API_EMBEDDINGS_VERTEXAI_PROJECT_ID`	Vertex AI project ID for embeddings	—

export HINDSIGHT_API_EMBEDDINGS_PROVIDER=local
export HINDSIGHT_API_EMBEDDINGS_LOCAL_MODEL=BAAI/bge-small-en-v1.5

Once memories are stored, you cannot change the embedding dimension without losing data. On an empty database the schema is adjusted automatically at startup.

Reranker

Variable	Description	Default
`HINDSIGHT_API_RERANKER_PROVIDER`	Provider: `local`, `tei`, `cohere`, `openrouter`, `zeroentropy`, `siliconflow`, `google`, `flashrank`, `litellm`, `litellm-sdk`, `jina-mlx`, `rrf`	`local`
`HINDSIGHT_API_RERANKER_LOCAL_MODEL`	Model for local provider	`cross-encoder/ms-marco-MiniLM-L-6-v2`
`HINDSIGHT_API_RERANKER_LOCAL_MAX_CONCURRENT`	Max concurrent local reranking	`4`
`HINDSIGHT_API_RERANKER_LOCAL_FORCE_CPU`	Force CPU mode	`false`
`HINDSIGHT_API_RERANKER_LOCAL_FP16`	Half-precision inference (27–36% faster on MPS)	`false`
`HINDSIGHT_API_RERANKER_LOCAL_BUCKET_BATCHING`	Sort pairs by token length before batching (36–54% faster)	`false`
`HINDSIGHT_API_RERANKER_TEI_URL`	TEI server URL	—
`HINDSIGHT_API_RERANKER_COHERE_API_KEY`	Cohere API key	—
`HINDSIGHT_API_RERANKER_COHERE_MODEL`	Cohere rerank model	`rerank-english-v3.0`
`HINDSIGHT_API_RERANKER_COHERE_BASE_URL`	Custom base URL for any Cohere-compatible `/rerank` endpoint	—
`HINDSIGHT_API_RERANKER_ZEROENTROPY_API_KEY`	ZeroEntropy API key	—
`HINDSIGHT_API_RERANKER_ZEROENTROPY_MODEL`	ZeroEntropy model (`zerank-2`, `zerank-2-small`)	`zerank-2`
`HINDSIGHT_API_RERANKER_FLASHRANK_MODEL`	FlashRank model	`ms-marco-MiniLM-L-12-v2`

export HINDSIGHT_API_RERANKER_PROVIDER=local
export HINDSIGHT_API_RERANKER_LOCAL_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2

Authentication

By default, Hindsight runs without authentication. For production deployments, enable the built-in API key extension:

export HINDSIGHT_API_TENANT_EXTENSION=hindsight_api.extensions.builtin.tenant:ApiKeyTenantExtension
export HINDSIGHT_API_TENANT_API_KEY=your-secret-api-key

When enabled, all requests must include the API key in the Authorization header:

curl -H "Authorization: Bearer your-secret-api-key" \
  http://localhost:8888/v1/default/banks

Requests without a valid key receive 401 Unauthorized.

For advanced authentication (JWT, OAuth, multi-tenant schemas), implement a custom TenantExtension. See the Extensions page for details.

Server

Variable	Description	Default
`HINDSIGHT_API_HOST`	Bind address	`0.0.0.0`
`HINDSIGHT_API_PORT`	Server port	`8888`
`HINDSIGHT_API_BASE_PATH`	Base path when behind a reverse proxy (e.g., `/hindsight`)	`""` (root)
`HINDSIGHT_API_WORKERS`	Number of uvicorn worker processes	`1`
`HINDSIGHT_API_LOG_LEVEL`	Log level: `debug`, `info`, `warning`, `error`	`info`
`HINDSIGHT_API_LOG_FORMAT`	Log format: `text` or `json`	`text`
`HINDSIGHT_API_MCP_ENABLED`	Enable MCP server at `/mcp/{bank_id}/`	`true`

Control Plane

Variable	Description	Default
`HINDSIGHT_CP_DATAPLANE_API_URL`	URL of the API service	`http://localhost:8888`
`HINDSIGHT_CP_ACCESS_KEY`	Access key to protect the UI. When set, users must log in.	(none)
`NEXT_PUBLIC_BASE_PATH`	Base path for the UI when behind a reverse proxy	`""` (root)

# Point Control Plane at a remote API
export HINDSIGHT_CP_DATAPLANE_API_URL=http://api.example.com:8888

# Protect the UI with an access key
export HINDSIGHT_CP_ACCESS_KEY=my-secret-key

Example .env file

# API Service
HINDSIGHT_API_DATABASE_URL=postgresql://hindsight:hindsight_dev@localhost:5432/hindsight
HINDSIGHT_API_LLM_PROVIDER=groq
HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx

# Authentication (recommended for production)
# HINDSIGHT_API_TENANT_EXTENSION=hindsight_api.extensions.builtin.tenant:ApiKeyTenantExtension
# HINDSIGHT_API_TENANT_API_KEY=your-secret-api-key

# Control Plane
HINDSIGHT_CP_DATAPLANE_API_URL=http://localhost:8888

Get Started

Core Concepts

SDKs & Clients

Integrations

Deployment & Operations

Configure Hindsight with environment variables

Database

Connection pool

LLM provider

Built-in llama.cpp

Per-operation LLM configuration

Embeddings

Reranker

Authentication

Server

Control Plane

Example .env file

Build docs developers (and LLMs) love

Get Started

Core Concepts

SDKs & Clients

Integrations

Deployment & Operations

Documentation Index

​Database

​Connection pool

​LLM provider

​Built-in llama.cpp

​Per-operation LLM configuration

​Embeddings

​Reranker

​Authentication

​Server

​Control Plane

​Example .env file

Build docs developers (and LLMs) love

Database

Connection pool

LLM provider

Built-in llama.cpp

Per-operation LLM configuration

Embeddings

Reranker

Authentication

Server

Control Plane

Example .env file