Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vectorize-io/hindsight/llms.txt

Use this file to discover all available pages before exploring further.

Hindsight is configured entirely through environment variables. There are two services, each with its own prefix: the API service handles all memory operations and uses HINDSIGHT_API_* variables, while the Control Plane (web UI) uses HINDSIGHT_CP_* variables.
ServicePrefixDescription
API ServiceHINDSIGHT_API_*Core memory engine (retain, recall, reflect)
Control PlaneHINDSIGHT_CP_*Web UI for managing memory banks

Database

VariableDescriptionDefault
HINDSIGHT_API_DATABASE_URLPostgreSQL connection stringpg0 (embedded)
HINDSIGHT_API_READ_DATABASE_URLOptional read-replica URL. Recall queries are routed through a separate pool against this URL, offloading the primary.Unset (uses primary)
HINDSIGHT_API_MIGRATION_DATABASE_URLDirect URL for running migrations, bypassing connection poolers (e.g. PgBouncer).Falls back to DATABASE_URL
HINDSIGHT_API_DATABASE_SCHEMAPostgreSQL schema name for tablespublic
HINDSIGHT_API_RUN_MIGRATIONS_ON_STARTUPRun database migrations on API startuptrue
When no DATABASE_URL is set, Hindsight starts an embedded pg0 instance — convenient for development, but not recommended for production. DATABASE_SCHEMA is useful for multi-database setups, hosting platforms like Supabase where the public schema is shared, or organizational naming conventions. Migrations automatically create the schema if it doesn’t exist.
export HINDSIGHT_API_DATABASE_URL=postgresql://user:pass@host:5432/dbname
export HINDSIGHT_API_DATABASE_SCHEMA=hindsight

Connection pool

VariableDescriptionDefault
HINDSIGHT_API_DB_POOL_MIN_SIZEMinimum connections in the primary pool5
HINDSIGHT_API_DB_POOL_MAX_SIZEMaximum connections in the primary pool100
HINDSIGHT_API_READ_DB_POOL_MIN_SIZEMinimum connections in the read-replica poolFalls back to DB_POOL_MIN_SIZE
HINDSIGHT_API_READ_DB_POOL_MAX_SIZEMaximum connections in the read-replica poolFalls back to DB_POOL_MAX_SIZE
HINDSIGHT_API_DB_COMMAND_TIMEOUTPostgreSQL command timeout in seconds (client-side)60
HINDSIGHT_API_DB_ACQUIRE_TIMEOUTConnection acquisition timeout in seconds30
HINDSIGHT_API_DB_STATEMENT_TIMEOUTServer-side statement_timeout for every pool connection, in seconds. Set to 0 to disable.600
For high-concurrency workloads, increase DB_POOL_MAX_SIZE. Each concurrent recall or reflect operation can use 2–4 connections. To run migrations manually before starting the API:
# Migrate the base schema plus all discovered tenant schemas
hindsight-admin run-db-migration

# Or migrate a specific schema only
hindsight-admin run-db-migration --schema tenant_acme

LLM provider

VariableDescriptionDefault
HINDSIGHT_API_LLM_PROVIDERProvider name (see examples below)openai
HINDSIGHT_API_LLM_API_KEYAPI key for the LLM provider
HINDSIGHT_API_LLM_MODELModel namegpt-5-mini
HINDSIGHT_API_LLM_BASE_URLCustom LLM endpoint URLProvider default
HINDSIGHT_API_LLM_MAX_CONCURRENTMax concurrent LLM requests32
HINDSIGHT_API_LLM_MAX_RETRIESMax retry attempts for LLM API calls3
HINDSIGHT_API_LLM_INITIAL_BACKOFFInitial retry backoff in seconds (exponential)1.0
HINDSIGHT_API_LLM_MAX_BACKOFFMax retry backoff cap in seconds60.0
HINDSIGHT_API_LLM_TIMEOUTLLM request timeout in seconds120
Supported provider values: openai, openai-codex, claude-code, anthropic, gemini, groq, minimax, deepseek, zai, opencode-go, ollama, lmstudio, llamacpp, vertexai, bedrock, litellm, litellmrouter, volcano, openrouter, none.
export HINDSIGHT_API_LLM_PROVIDER=groq
export HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=openai/gpt-oss-20b

Built-in llama.cpp

The llamacpp provider runs a llama.cpp server as a managed subprocess — no external LLM server needed. On first run it auto-downloads a default GGUF model (~3.5 GB). Requires pip install 'hindsight-api-slim[local-llm]'.
VariableDescriptionDefault
HINDSIGHT_API_LLAMACPP_MODEL_PATHPath to a GGUF file. If unset, auto-downloads gemma-4-E2B-it-Q4_K_M.Auto-download
HINDSIGHT_API_LLAMACPP_GPU_LAYERSLayers to offload to GPU. -1 = all (recommended), 0 = CPU only.-1
HINDSIGHT_API_LLAMACPP_CONTEXT_SIZEContext window size in tokens8192
HINDSIGHT_API_LLAMACPP_NO_GRAMMARDisable JSON grammar enforcement (faster, less reliable output)false
HINDSIGHT_API_LLAMACPP_EXTRA_ARGSExtra CLI args passed to the llama.cpp server

Per-operation LLM configuration

Different operations have different requirements. Retain (fact extraction) benefits from models with strong structured output; Reflect can use lighter, faster models. Override the default LLM for each operation independently.
  • Retain: use models with strong structured output (e.g., GPT-4o, Claude) for accurate fact extraction
  • Reflect: use faster/cheaper models (e.g., GPT-4o-mini, Groq) for generation
  • Recall: does not use an LLM — no override needed
VariableDescriptionDefault
HINDSIGHT_API_RETAIN_LLM_PROVIDERLLM provider for retainFalls back to LLM_PROVIDER
HINDSIGHT_API_RETAIN_LLM_API_KEYAPI key for retain LLMFalls back to LLM_API_KEY
HINDSIGHT_API_RETAIN_LLM_MODELModel for retainFalls back to LLM_MODEL
HINDSIGHT_API_RETAIN_LLM_BASE_URLBase URL for retain LLMFalls back to LLM_BASE_URL
HINDSIGHT_API_RETAIN_LLM_MAX_CONCURRENTMax concurrent requests for retainFalls back to LLM_MAX_CONCURRENT
HINDSIGHT_API_RETAIN_LLM_MAX_RETRIESMax retries for retainFalls back to LLM_MAX_RETRIES
HINDSIGHT_API_RETAIN_LLM_TIMEOUTTimeout for retain requests (seconds)Falls back to LLM_TIMEOUT
HINDSIGHT_API_REFLECT_LLM_PROVIDERLLM provider for reflectFalls back to LLM_PROVIDER
HINDSIGHT_API_REFLECT_LLM_API_KEYAPI key for reflect LLMFalls back to LLM_API_KEY
HINDSIGHT_API_REFLECT_LLM_MODELModel for reflectFalls back to LLM_MODEL
HINDSIGHT_API_REFLECT_LLM_BASE_URLBase URL for reflect LLMFalls back to LLM_BASE_URL
HINDSIGHT_API_REFLECT_LLM_MAX_CONCURRENTMax concurrent requests for reflectFalls back to LLM_MAX_CONCURRENT
HINDSIGHT_API_REFLECT_LLM_TIMEOUTTimeout for reflect requests (seconds)Falls back to LLM_TIMEOUT
HINDSIGHT_API_CONSOLIDATION_LLM_PROVIDERLLM provider for observation consolidationFalls back to LLM_PROVIDER
HINDSIGHT_API_CONSOLIDATION_LLM_MODELModel for consolidationFalls back to LLM_MODEL
HINDSIGHT_API_CONSOLIDATION_LLM_MAX_CONCURRENTMax concurrent requests for consolidationFalls back to LLM_MAX_CONCURRENT
Example: separate models for retain and reflect
# Default LLM
export HINDSIGHT_API_LLM_PROVIDER=openai
export HINDSIGHT_API_LLM_API_KEY=sk-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gpt-4o

# Strong model for structured extraction
export HINDSIGHT_API_RETAIN_LLM_MODEL=gpt-4o

# Faster model for generation
export HINDSIGHT_API_REFLECT_LLM_PROVIDER=groq
export HINDSIGHT_API_REFLECT_LLM_API_KEY=gsk_xxxxxxxxxxxx
export HINDSIGHT_API_REFLECT_LLM_MODEL=llama-3.3-70b-versatile
Example: tuning retries for rate-limited APIs
export HINDSIGHT_API_LLM_PROVIDER=anthropic
export HINDSIGHT_API_LLM_API_KEY=sk-ant-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=claude-sonnet-4-20250514

# Reduce concurrency to stay within rate limits
export HINDSIGHT_API_RETAIN_LLM_MAX_CONCURRENT=3
export HINDSIGHT_API_RETAIN_LLM_INITIAL_BACKOFF=2.0
export HINDSIGHT_API_RETAIN_LLM_MAX_BACKOFF=120.0

Embeddings

Embedding variable names include a provider segment: HINDSIGHT_API_EMBEDDINGS_{PROVIDER}_{PARAMETER}. For example, when using openai, the model var is HINDSIGHT_API_EMBEDDINGS_OPENAI_MODEL — not HINDSIGHT_API_EMBEDDINGS_MODEL. Misnamed keys cause Hindsight to fall back to default OpenAI settings and fail with auth errors.
VariableDescriptionDefault
HINDSIGHT_API_EMBEDDINGS_PROVIDERProvider: local, tei, openai, openrouter, cohere, google, litellm, litellm-sdklocal
HINDSIGHT_API_EMBEDDINGS_LOCAL_MODELModel for local providerBAAI/bge-small-en-v1.5
HINDSIGHT_API_EMBEDDINGS_LOCAL_FORCE_CPUForce CPU mode (avoids MPS/XPC issues on macOS)false
HINDSIGHT_API_EMBEDDINGS_TEI_URLTEI server URL
HINDSIGHT_API_EMBEDDINGS_OPENAI_API_KEYOpenAI API key (falls back to LLM_API_KEY)
HINDSIGHT_API_EMBEDDINGS_OPENAI_MODELOpenAI embedding modeltext-embedding-3-small
HINDSIGHT_API_EMBEDDINGS_OPENAI_BASE_URLCustom base URL (e.g., Azure OpenAI)
HINDSIGHT_API_EMBEDDINGS_COHERE_API_KEYCohere API key
HINDSIGHT_API_EMBEDDINGS_COHERE_MODELCohere embedding modelembed-english-v3.0
HINDSIGHT_API_EMBEDDINGS_GEMINI_API_KEYGemini API key (falls back to LLM_API_KEY)
HINDSIGHT_API_EMBEDDINGS_GEMINI_MODELGemini embedding modelgemini-embedding-001
HINDSIGHT_API_EMBEDDINGS_GEMINI_OUTPUT_DIMENSIONALITYOutput embedding dimensions for Gemini768
HINDSIGHT_API_EMBEDDINGS_VERTEXAI_PROJECT_IDVertex AI project ID for embeddings
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=local
export HINDSIGHT_API_EMBEDDINGS_LOCAL_MODEL=BAAI/bge-small-en-v1.5
Once memories are stored, you cannot change the embedding dimension without losing data. On an empty database the schema is adjusted automatically at startup.

Reranker

VariableDescriptionDefault
HINDSIGHT_API_RERANKER_PROVIDERProvider: local, tei, cohere, openrouter, zeroentropy, siliconflow, google, flashrank, litellm, litellm-sdk, jina-mlx, rrflocal
HINDSIGHT_API_RERANKER_LOCAL_MODELModel for local providercross-encoder/ms-marco-MiniLM-L-6-v2
HINDSIGHT_API_RERANKER_LOCAL_MAX_CONCURRENTMax concurrent local reranking4
HINDSIGHT_API_RERANKER_LOCAL_FORCE_CPUForce CPU modefalse
HINDSIGHT_API_RERANKER_LOCAL_FP16Half-precision inference (27–36% faster on MPS)false
HINDSIGHT_API_RERANKER_LOCAL_BUCKET_BATCHINGSort pairs by token length before batching (36–54% faster)false
HINDSIGHT_API_RERANKER_TEI_URLTEI server URL
HINDSIGHT_API_RERANKER_COHERE_API_KEYCohere API key
HINDSIGHT_API_RERANKER_COHERE_MODELCohere rerank modelrerank-english-v3.0
HINDSIGHT_API_RERANKER_COHERE_BASE_URLCustom base URL for any Cohere-compatible /rerank endpoint
HINDSIGHT_API_RERANKER_ZEROENTROPY_API_KEYZeroEntropy API key
HINDSIGHT_API_RERANKER_ZEROENTROPY_MODELZeroEntropy model (zerank-2, zerank-2-small)zerank-2
HINDSIGHT_API_RERANKER_FLASHRANK_MODELFlashRank modelms-marco-MiniLM-L-12-v2
export HINDSIGHT_API_RERANKER_PROVIDER=local
export HINDSIGHT_API_RERANKER_LOCAL_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2

Authentication

By default, Hindsight runs without authentication. For production deployments, enable the built-in API key extension:
export HINDSIGHT_API_TENANT_EXTENSION=hindsight_api.extensions.builtin.tenant:ApiKeyTenantExtension
export HINDSIGHT_API_TENANT_API_KEY=your-secret-api-key
When enabled, all requests must include the API key in the Authorization header:
curl -H "Authorization: Bearer your-secret-api-key" \
  http://localhost:8888/v1/default/banks
Requests without a valid key receive 401 Unauthorized.
For advanced authentication (JWT, OAuth, multi-tenant schemas), implement a custom TenantExtension. See the Extensions page for details.

Server

VariableDescriptionDefault
HINDSIGHT_API_HOSTBind address0.0.0.0
HINDSIGHT_API_PORTServer port8888
HINDSIGHT_API_BASE_PATHBase path when behind a reverse proxy (e.g., /hindsight)"" (root)
HINDSIGHT_API_WORKERSNumber of uvicorn worker processes1
HINDSIGHT_API_LOG_LEVELLog level: debug, info, warning, errorinfo
HINDSIGHT_API_LOG_FORMATLog format: text or jsontext
HINDSIGHT_API_MCP_ENABLEDEnable MCP server at /mcp/{bank_id}/true

Control Plane

VariableDescriptionDefault
HINDSIGHT_CP_DATAPLANE_API_URLURL of the API servicehttp://localhost:8888
HINDSIGHT_CP_ACCESS_KEYAccess key to protect the UI. When set, users must log in.(none)
NEXT_PUBLIC_BASE_PATHBase path for the UI when behind a reverse proxy"" (root)
# Point Control Plane at a remote API
export HINDSIGHT_CP_DATAPLANE_API_URL=http://api.example.com:8888

# Protect the UI with an access key
export HINDSIGHT_CP_ACCESS_KEY=my-secret-key

Example .env file

# API Service
HINDSIGHT_API_DATABASE_URL=postgresql://hindsight:hindsight_dev@localhost:5432/hindsight
HINDSIGHT_API_LLM_PROVIDER=groq
HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx

# Authentication (recommended for production)
# HINDSIGHT_API_TENANT_EXTENSION=hindsight_api.extensions.builtin.tenant:ApiKeyTenantExtension
# HINDSIGHT_API_TENANT_API_KEY=your-secret-api-key

# Control Plane
HINDSIGHT_CP_DATAPLANE_API_URL=http://localhost:8888

Build docs developers (and LLMs) love