Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vectorize-io/hindsight/llms.txt

Use this file to discover all available pages before exploring further.

Hindsight exposes three operations — retain, recall, and reflect — through a lightweight HTTP API. This guide walks you through starting the server, installing a client, and making your first memory calls.
Hindsight requires an LLM API key for fact extraction and answer generation. The examples below use OpenAI, but Groq, Anthropic, Gemini, Ollama, and others are all supported. See LLM Providers for details.
1

Start the Hindsight server

Start the server using Docker (includes the web UI) or pip (API only):
export OPENAI_API_KEY=sk-xxx

docker run --rm -it --pull always -p 8888:8888 -p 9999:9999 \
  -e HINDSIGHT_API_LLM_API_KEY=$OPENAI_API_KEY \
  -v $HOME/.hindsight-docker:/home/hindsight/.pg0 \
  ghcr.io/vectorize-io/hindsight:latest
Once running:
In production, set a stable HINDSIGHT_API_WORKER_ID (e.g., -e HINDSIGHT_API_WORKER_ID=hindsight-prod) so the worker keeps the same identity across container restarts.
2

Install the client

Install the SDK for your language:
pip install hindsight-client
3

Retain: store a memory

Use retain to push information into a memory bank. Hindsight extracts facts, entities, and temporal data automatically.
from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")

# Store a simple fact
client.retain(bank_id="my-bank", content="Alice works at Google as a software engineer")

# Store with optional context and timestamp
client.retain(
    bank_id="my-bank",
    content="Alice got promoted to senior engineer",
    context="career update",
    timestamp="2025-06-15T10:00:00Z"
)
4

Recall: search memories

Use recall to retrieve memories. Four search strategies run in parallel — semantic, keyword, graph, and temporal — and results are merged and reranked by relevance.
from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")

# Semantic search
results = client.recall(bank_id="my-bank", query="What does Alice do?")
print(results)

# Temporal search
results = client.recall(bank_id="my-bank", query="What happened in June?")
print(results)
5

Reflect: reason over memories

Use reflect to generate a contextual answer grounded in the bank’s memories. Unlike recall, reflect uses the bank’s mission, directives, and disposition to shape the response.
from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")

response = client.reflect(bank_id="my-bank", query="Tell me about Alice")
print(response)

What’s happening under the hood

OperationWhat Hindsight does
RetainExtracts facts, entities, and relationships; builds time series and search indexes
RecallRuns semantic, keyword, graph, and temporal search in parallel; merges via RRF; reranks with a cross-encoder
ReflectRetrieves memories in priority order (Mental Models → Observations → Facts); generates a disposition-aware response

Next steps

Deploy Hindsight

Docker Compose with external PostgreSQL, Helm/Kubernetes, pip, and embedded Python.

Core concepts

Understand memory types, TEMPR retrieval, and observation consolidation in depth.

API reference

Full reference for retain, recall, reflect, memory banks, and more.

Python SDK

Async usage, batch retain, file uploads, and the full client API.

Build docs developers (and LLMs) love