Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vectorize-io/hindsight/llms.txt

Use this file to discover all available pages before exploring further.

The Hindsight Python SDK (hindsight-client) is a typed, high-level wrapper around the Hindsight HTTP API. Use it when you already have a Hindsight server running — locally, in Docker, or as a managed service — and want a clean Python interface. Every method ships with both a synchronous and an async-prefixed variant so the SDK works in scripts, REPLs, and async frameworks like FastAPI, LangGraph, and CrewAI.
If you want to run Hindsight inside your Python process with no external server, see Embedded Python instead.

Installation

pip install hindsight-client

Create the client

from hindsight_client import Hindsight

client = Hindsight(
    base_url="http://localhost:8888",
    api_key="your-api-key",   # optional — omit for unauthenticated servers
    timeout=300.0,            # request timeout in seconds (default: 300)
)
base_url
str
required
The base URL of your Hindsight API server, e.g. http://localhost:8888.
api_key
str
Bearer token sent in the Authorization header on every request. Omit for servers without authentication.
timeout
float
Per-request timeout in seconds. Defaults to 300.0.
user_agent
str
Override the default User-Agent header. Set this in integrations to identify the caller, e.g. "hindsight-crewai/1.2.0".

Async vs sync

Sync methods (retain, recall, reflect, …) call asyncio.run_until_complete internally. They will raise a RuntimeError if called from inside a running event loop. Always use the a-prefixed async variants inside async def functions.
# Async — preferred in frameworks (FastAPI, LangGraph, CrewAI, …)
await client.aretain(bank_id="alice", content="Alice loves AI")
response = await client.arecall(bank_id="alice", query="What does Alice like?")
answer   = await client.areflect(bank_id="alice", query="What are my interests?")

# Sync — scripts and REPLs only
client.retain(bank_id="alice", content="Alice loves AI")
response = client.recall(bank_id="alice", query="What does Alice like?")

Core operations

retain / aretain

Store a single memory in a bank.
from datetime import datetime
from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")

# Minimal
client.retain(bank_id="my-bank", content="Alice works at Google")

# With all options
client.retain(
    bank_id="my-bank",
    content="Alice got promoted to Staff Engineer",
    context="career update",
    timestamp=datetime(2024, 6, 1),
    document_id="conversation_001",
    metadata={"source": "slack", "channel": "#general"},
    tags=["career", "alice"],
    update_mode="replace",   # or "append"
    retain_async=False,      # True → background processing
)
bank_id
str
required
The memory bank to write to.
content
str
required
The text to store.
timestamp
datetime
Event time for the memory. Defaults to the current time when omitted.
context
str
Brief description of where the content came from, e.g. "slack message".
document_id
str
Groups one or more memories under a logical document. Useful for linking a conversation thread.
metadata
dict[str, str]
Arbitrary key-value pairs attached to the memory for later filtering.
tags
list[str]
Tags for filtering memories during recall and reflect.
update_mode
str
How to handle an existing document with the same document_id. "replace" overwrites; "append" adds alongside existing content.
retain_async
bool
If True, processing happens in the background and the call returns immediately. Default False.

retain_batch / aretain_batch

Store multiple memories in one request.
client.retain_batch(
    bank_id="my-bank",
    items=[
        {"content": "Alice works at Google", "context": "career"},
        {"content": "Bob is a data scientist", "context": "career", "tags": ["bob"]},
    ],
    document_id="conversation_001",
    document_tags=["team-intro"],
)

recall / arecall

Search for memories using semantic similarity.
# Basic
results = client.recall(bank_id="my-bank", query="What does Alice do?")
for r in results.results:
    print(f"{r.text}  (type: {r.type})")

# With options
results = client.recall(
    bank_id="my-bank",
    query="What does Alice do?",
    types=["world", "observation"],   # fact types to include
    max_tokens=4096,
    budget="high",                    # "low" | "mid" | "high"
    tags=["alice"],
    tags_match="all_strict",          # "any" | "all" | "any_strict" | "all_strict"
    include_chunks=True,
    max_chunk_tokens=500,
)

for r in results.results:
    print(r.text)
    if r.chunks:
        print("  source:", r.chunks[0].text[:120])
bank_id
str
required
The memory bank to search.
query
str
required
Natural-language search query.
max_tokens
int
Maximum tokens in the combined result set. Default 4096.
budget
str
Retrieval effort: "low", "mid" (default), or "high". Higher budgets search more broadly.
types
list[str]
Limit results to specific fact types: world, experience, opinion, observation.
tags
list[str]
Filter by tags. How tags must match is controlled by tags_match.
tags_match
str
"any" (OR, includes untagged), "all" (AND, includes untagged), "any_strict" (OR, excludes untagged), "all_strict" (AND, excludes untagged). Default "any".
include_chunks
bool
Include the raw source text chunks that gave rise to each fact. Default False.
include_entities
bool
Include entity observations alongside facts. Default False.

reflect / areflect

Generate a contextual answer by reasoning over stored memories.
# Basic
answer = client.reflect(
    bank_id="my-bank",
    query="What should I know about Alice?",
)
print(answer.text)

# With structured output
schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "role": {"type": "string"},
    },
}
answer = client.reflect(
    bank_id="my-bank",
    query="Summarise Alice's role",
    budget="mid",
    context="preparing for a 1:1",
    response_schema=schema,
    include_facts=True,        # populate answer.based_on
    tags=["alice"],
)
print(answer.text)
print(answer.structured_output)
for fact in answer.based_on.facts:
    print(" •", fact.text)
bank_id
str
required
The memory bank to reason over.
query
str
required
The question or prompt.
budget
str
Retrieval effort: "low" (default), "mid", or "high".
context
str
Extra context injected alongside the query.
response_schema
dict
JSON Schema for structured output. When provided, answer.structured_output contains the parsed result.
include_facts
bool
If True, the response includes based_on listing the memories, mental models, and directives used. Default False.
exclude_mental_models
bool
If True, mental models are excluded from the reflection context. Default False.

Bank management

# Create or update a bank
client.create_bank(
    bank_id="my-bank",
    reflect_mission="You are a helpful assistant tracking user preferences.",
    retain_mission="Focus on preferences, facts, and notable events.",
    retain_extraction_mode="concise",   # "concise" | "verbose" | "custom"
    enable_observations=True,
    background="Internal assistant for the engineering team.",
)

# Update config overrides (granular, patch-style)
client.update_bank_config(
    bank_id="my-bank",
    disposition_skepticism=2,    # 1 (trusting) – 5 (skeptical)
    disposition_literalism=3,
    disposition_empathy=4,
)

# List memories
memories = client.list_memories(
    bank_id="my-bank",
    type="world",          # optional fact-type filter
    search_query="Alice",  # optional text search
    limit=100,
    offset=0,
)

# Delete a bank
client.delete_bank("my-bank")

Mental models

Mental models are pre-computed summaries derived from bank memories.
# Create
client.create_mental_model(
    bank_id="my-bank",
    name="User Preferences",
    source_query="What are the user's preferences and habits?",
    tags=["preferences"],
    trigger={"refresh_after_consolidation": True},
)

# List
models = client.list_mental_models(bank_id="my-bank")

# Get
model = client.get_mental_model(bank_id="my-bank", mental_model_id="model-id")

# Refresh (re-runs source_query against current memories)
client.refresh_mental_model(bank_id="my-bank", mental_model_id="model-id")

# Update metadata
client.update_mental_model(
    bank_id="my-bank",
    mental_model_id="model-id",
    name="Updated Preferences",
    source_query="New query",
)

# History of content changes
history = client.get_mental_model_history(bank_id="my-bank", mental_model_id="model-id")

# Delete
client.delete_mental_model(bank_id="my-bank", mental_model_id="model-id")

Context manager

from hindsight_client import Hindsight

with Hindsight(base_url="http://localhost:8888") as client:
    client.retain(bank_id="my-bank", content="Hello world")
    results = client.recall(bank_id="my-bank", query="Hello")
# Client is closed automatically on exit

Low-level API access

For operations not covered by the convenience methods — documents, entities, webhooks, async operations, file uploads, and monitoring — use the typed sub-clients exposed as properties:
# Documents
docs = await client.documents.list_documents("my-bank")
await client.documents.delete_document("my-bank", "doc-123")

# Entities
entities = await client.entities.list_entities("my-bank")

# Async operation status
status = await client.operations.get_operation_status("my-bank", "op-456")

# Webhooks
await client.webhooks.create_webhook("my-bank", ...)

# Health check
health = await client.monitoring.health_check()
All low-level methods are async-only. Use await or asyncio.run().
PropertyAPI surface
client.memoryCore memory operations
client.banksBank management
client.documentsDocument CRUD
client.entitiesEntity browsing
client.mental_modelsMental model management
client.directivesDirective management
client.operationsAsync operation tracking
client.webhooksWebhook management
client.filesFile upload
client.monitoringHealth, version, metrics

Build docs developers (and LLMs) love