Get started

Hindsight exposes three operations — retain, recall, and reflect — through a lightweight HTTP API. This guide walks you through starting the server, installing a client, and making your first memory calls.

Hindsight requires an LLM API key for fact extraction and answer generation. The examples below use OpenAI, but Groq, Anthropic, Gemini, Ollama, and others are all supported. See LLM Providers for details.

Start the Hindsight server

Start the server using Docker (includes the web UI) or pip (API only):

export OPENAI_API_KEY=sk-xxx

docker run --rm -it --pull always -p 8888:8888 -p 9999:9999 \
  -e HINDSIGHT_API_LLM_API_KEY=$OPENAI_API_KEY \
  -v $HOME/.hindsight-docker:/home/hindsight/.pg0 \
  ghcr.io/vectorize-io/hindsight:latest

Once running:

API: http://localhost:8888
Web UI (Docker only): http://localhost:9999

In production, set a stable HINDSIGHT_API_WORKER_ID (e.g., -e HINDSIGHT_API_WORKER_ID=hindsight-prod) so the worker keeps the same identity across container restarts.

Install the client

Install the SDK for your language:

pip install hindsight-client

Retain: store a memory

Use retain to push information into a memory bank. Hindsight extracts facts, entities, and temporal data automatically.

from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")

# Store a simple fact
client.retain(bank_id="my-bank", content="Alice works at Google as a software engineer")

# Store with optional context and timestamp
client.retain(
    bank_id="my-bank",
    content="Alice got promoted to senior engineer",
    context="career update",
    timestamp="2025-06-15T10:00:00Z"
)

Recall: search memories

Use recall to retrieve memories. Four search strategies run in parallel — semantic, keyword, graph, and temporal — and results are merged and reranked by relevance.

from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")

# Semantic search
results = client.recall(bank_id="my-bank", query="What does Alice do?")
print(results)

# Temporal search
results = client.recall(bank_id="my-bank", query="What happened in June?")
print(results)

Reflect: reason over memories

Use reflect to generate a contextual answer grounded in the bank’s memories. Unlike recall, reflect uses the bank’s mission, directives, and disposition to shape the response.

from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")

response = client.reflect(bank_id="my-bank", query="Tell me about Alice")
print(response)

What’s happening under the hood

Operation	What Hindsight does
Retain	Extracts facts, entities, and relationships; builds time series and search indexes
Recall	Runs semantic, keyword, graph, and temporal search in parallel; merges via RRF; reranks with a cross-encoder
Reflect	Retrieves memories in priority order (Mental Models → Observations → Facts); generates a disposition-aware response

Next steps

Deploy Hindsight

Docker Compose with external PostgreSQL, Helm/Kubernetes, pip, and embedded Python.

Core concepts

Understand memory types, TEMPR retrieval, and observation consolidation in depth.

API reference

Full reference for retain, recall, reflect, memory banks, and more.

Python SDK

Async usage, batch retain, file uploads, and the full client API.

Core Concepts

SDKs & Clients

Integrations

Deployment & Operations

Get started — add agent memory in 60 seconds

What’s happening under the hood

Next steps

Deploy Hindsight

Core concepts

API reference

Python SDK

Build docs developers (and LLMs) love

Get Started

Core Concepts

SDKs & Clients

Integrations

Deployment & Operations

Documentation Index

​What’s happening under the hood

​Next steps

Deploy Hindsight

Core concepts

API reference

Python SDK

Build docs developers (and LLMs) love

What’s happening under the hood

Next steps