Python SDK — retain, recall, reflect in Python

The Hindsight Python SDK (hindsight-client) is a typed, high-level wrapper around the Hindsight HTTP API. Use it when you already have a Hindsight server running — locally, in Docker, or as a managed service — and want a clean Python interface. Every method ships with both a synchronous and an async-prefixed variant so the SDK works in scripts, REPLs, and async frameworks like FastAPI, LangGraph, and CrewAI.

If you want to run Hindsight inside your Python process with no external server, see Embedded Python instead.

Installation

pip install hindsight-client

Create the client

from hindsight_client import Hindsight

client = Hindsight(
    base_url="http://localhost:8888",
    api_key="your-api-key",   # optional — omit for unauthenticated servers
    timeout=300.0,            # request timeout in seconds (default: 300)
)

base_url

str

required

The base URL of your Hindsight API server, e.g. http://localhost:8888.

api_key

str

Bearer token sent in the Authorization header on every request. Omit for servers without authentication.

timeout

float

Per-request timeout in seconds. Defaults to 300.0.

user_agent

str

Override the default User-Agent header. Set this in integrations to identify the caller, e.g. "hindsight-crewai/1.2.0".

Async vs sync

Sync methods (retain, recall, reflect, …) call asyncio.run_until_complete internally. They will raise a RuntimeError if called from inside a running event loop. Always use the a-prefixed async variants inside async def functions.

# Async — preferred in frameworks (FastAPI, LangGraph, CrewAI, …)
await client.aretain(bank_id="alice", content="Alice loves AI")
response = await client.arecall(bank_id="alice", query="What does Alice like?")
answer   = await client.areflect(bank_id="alice", query="What are my interests?")

# Sync — scripts and REPLs only
client.retain(bank_id="alice", content="Alice loves AI")
response = client.recall(bank_id="alice", query="What does Alice like?")

Core operations

retain / aretain

Store a single memory in a bank.

from datetime import datetime
from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")

# Minimal
client.retain(bank_id="my-bank", content="Alice works at Google")

# With all options
client.retain(
    bank_id="my-bank",
    content="Alice got promoted to Staff Engineer",
    context="career update",
    timestamp=datetime(2024, 6, 1),
    document_id="conversation_001",
    metadata={"source": "slack", "channel": "#general"},
    tags=["career", "alice"],
    update_mode="replace",   # or "append"
    retain_async=False,      # True → background processing
)

bank_id

str

required

The memory bank to write to.

content

str

required

The text to store.

timestamp

datetime

Event time for the memory. Defaults to the current time when omitted.

context

str

Brief description of where the content came from, e.g. "slack message".

document_id

str

Groups one or more memories under a logical document. Useful for linking a conversation thread.

metadata

dict[str, str]

Arbitrary key-value pairs attached to the memory for later filtering.

retain_batch / aretain_batch

Store multiple memories in one request.

client.retain_batch(
    bank_id="my-bank",
    items=[
        {"content": "Alice works at Google", "context": "career"},
        {"content": "Bob is a data scientist", "context": "career", "tags": ["bob"]},
    ],
    document_id="conversation_001",
    document_tags=["team-intro"],
)

recall / arecall

Search for memories using semantic similarity.

# Basic
results = client.recall(bank_id="my-bank", query="What does Alice do?")
for r in results.results:
    print(f"{r.text}  (type: {r.type})")

# With options
results = client.recall(
    bank_id="my-bank",
    query="What does Alice do?",
    types=["world", "observation"],   # fact types to include
    max_tokens=4096,
    budget="high",                    # "low" | "mid" | "high"
    tags=["alice"],
    tags_match="all_strict",          # "any" | "all" | "any_strict" | "all_strict"
    include_chunks=True,
    max_chunk_tokens=500,
)

for r in results.results:
    print(r.text)
    if r.chunks:
        print("  source:", r.chunks[0].text[:120])

bank_id

str

required

The memory bank to search.

query

str

required

Natural-language search query.

max_tokens

int

Maximum tokens in the combined result set. Default 4096.

budget

str

Retrieval effort: "low", "mid" (default), or "high". Higher budgets search more broadly.

types

list[str]

Limit results to specific fact types: world, experience, opinion, observation.

reflect / areflect

Generate a contextual answer by reasoning over stored memories.

# Basic
answer = client.reflect(
    bank_id="my-bank",
    query="What should I know about Alice?",
)
print(answer.text)

# With structured output
schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "role": {"type": "string"},
    },
}
answer = client.reflect(
    bank_id="my-bank",
    query="Summarise Alice's role",
    budget="mid",
    context="preparing for a 1:1",
    response_schema=schema,
    include_facts=True,        # populate answer.based_on
    tags=["alice"],
)
print(answer.text)
print(answer.structured_output)
for fact in answer.based_on.facts:
    print(" •", fact.text)

bank_id

str

required

The memory bank to reason over.

query

str

required

The question or prompt.

budget

str

Retrieval effort: "low" (default), "mid", or "high".

context

str

Extra context injected alongside the query.

response_schema

dict

JSON Schema for structured output. When provided, answer.structured_output contains the parsed result.

include_facts

bool

If True, the response includes based_on listing the memories, mental models, and directives used. Default False.

exclude_mental_models

bool

If True, mental models are excluded from the reflection context. Default False.

Bank management

# Create or update a bank
client.create_bank(
    bank_id="my-bank",
    reflect_mission="You are a helpful assistant tracking user preferences.",
    retain_mission="Focus on preferences, facts, and notable events.",
    retain_extraction_mode="concise",   # "concise" | "verbose" | "custom"
    enable_observations=True,
    background="Internal assistant for the engineering team.",
)

# Update config overrides (granular, patch-style)
client.update_bank_config(
    bank_id="my-bank",
    disposition_skepticism=2,    # 1 (trusting) – 5 (skeptical)
    disposition_literalism=3,
    disposition_empathy=4,
)

# List memories
memories = client.list_memories(
    bank_id="my-bank",
    type="world",          # optional fact-type filter
    search_query="Alice",  # optional text search
    limit=100,
    offset=0,
)

# Delete a bank
client.delete_bank("my-bank")

Mental models

Mental models are pre-computed summaries derived from bank memories.

# Create
client.create_mental_model(
    bank_id="my-bank",
    name="User Preferences",
    source_query="What are the user's preferences and habits?",
    tags=["preferences"],
    trigger={"refresh_after_consolidation": True},
)

# List
models = client.list_mental_models(bank_id="my-bank")

# Get
model = client.get_mental_model(bank_id="my-bank", mental_model_id="model-id")

# Refresh (re-runs source_query against current memories)
client.refresh_mental_model(bank_id="my-bank", mental_model_id="model-id")

# Update metadata
client.update_mental_model(
    bank_id="my-bank",
    mental_model_id="model-id",
    name="Updated Preferences",
    source_query="New query",
)

# History of content changes
history = client.get_mental_model_history(bank_id="my-bank", mental_model_id="model-id")

# Delete
client.delete_mental_model(bank_id="my-bank", mental_model_id="model-id")

Context manager

from hindsight_client import Hindsight

with Hindsight(base_url="http://localhost:8888") as client:
    client.retain(bank_id="my-bank", content="Hello world")
    results = client.recall(bank_id="my-bank", query="Hello")
# Client is closed automatically on exit

Low-level API access

For operations not covered by the convenience methods — documents, entities, webhooks, async operations, file uploads, and monitoring — use the typed sub-clients exposed as properties:

# Documents
docs = await client.documents.list_documents("my-bank")
await client.documents.delete_document("my-bank", "doc-123")

# Entities
entities = await client.entities.list_entities("my-bank")

# Async operation status
status = await client.operations.get_operation_status("my-bank", "op-456")

# Webhooks
await client.webhooks.create_webhook("my-bank", ...)

# Health check
health = await client.monitoring.health_check()

All low-level methods are async-only. Use await or asyncio.run().

Property	API surface
`client.memory`	Core memory operations
`client.banks`	Bank management
`client.documents`	Document CRUD
`client.entities`	Entity browsing
`client.mental_models`	Mental model management
`client.directives`	Directive management
`client.operations`	Async operation tracking
`client.webhooks`	Webhook management
`client.files`	File upload
`client.monitoring`	Health, version, metrics

Get Started

Core Concepts

SDKs & Clients

Integrations

Deployment & Operations

Python SDK — retain, recall, reflect in Python

Installation

Create the client

Async vs sync

Core operations

retain / aretain

retain_batch / aretain_batch

recall / arecall

reflect / areflect

Bank management

Mental models

Context manager

Low-level API access

Build docs developers (and LLMs) love

Get Started

Core Concepts

SDKs & Clients

Integrations

Deployment & Operations

Documentation Index

​Installation

​Create the client

​Async vs sync

​Core operations

​retain / aretain

​retain_batch / aretain_batch

​recall / arecall

​reflect / areflect

​Bank management

​Mental models

​Context manager

​Low-level API access

Build docs developers (and LLMs) love

Installation

Create the client

Async vs sync

Core operations

retain / aretain

retain_batch / aretain_batch

recall / arecall

reflect / areflect

Bank management

Mental models

Context manager

Low-level API access