The Hindsight Python SDK (Documentation Index
Fetch the complete documentation index at: https://mintlify.com/vectorize-io/hindsight/llms.txt
Use this file to discover all available pages before exploring further.
hindsight-client) is a typed, high-level wrapper around the Hindsight HTTP API. Use it when you already have a Hindsight server running — locally, in Docker, or as a managed service — and want a clean Python interface. Every method ships with both a synchronous and an async-prefixed variant so the SDK works in scripts, REPLs, and async frameworks like FastAPI, LangGraph, and CrewAI.
If you want to run Hindsight inside your Python process with no external server, see Embedded Python instead.
Installation
Create the client
The base URL of your Hindsight API server, e.g.
http://localhost:8888.Bearer token sent in the
Authorization header on every request. Omit for servers without authentication.Per-request timeout in seconds. Defaults to
300.0.Override the default
User-Agent header. Set this in integrations to identify the caller, e.g. "hindsight-crewai/1.2.0".Async vs sync
Core operations
retain / aretain
Store a single memory in a bank.The memory bank to write to.
The text to store.
Event time for the memory. Defaults to the current time when omitted.
Brief description of where the content came from, e.g.
"slack message".Groups one or more memories under a logical document. Useful for linking a conversation thread.
Arbitrary key-value pairs attached to the memory for later filtering.
Tags for filtering memories during recall and reflect.
How to handle an existing document with the same
document_id. "replace" overwrites; "append" adds alongside existing content.If
True, processing happens in the background and the call returns immediately. Default False.retain_batch / aretain_batch
Store multiple memories in one request.recall / arecall
Search for memories using semantic similarity.The memory bank to search.
Natural-language search query.
Maximum tokens in the combined result set. Default
4096.Retrieval effort:
"low", "mid" (default), or "high". Higher budgets search more broadly.Limit results to specific fact types:
world, experience, opinion, observation.Filter by tags. How tags must match is controlled by
tags_match."any" (OR, includes untagged), "all" (AND, includes untagged), "any_strict" (OR, excludes untagged), "all_strict" (AND, excludes untagged). Default "any".Include the raw source text chunks that gave rise to each fact. Default
False.Include entity observations alongside facts. Default
False.reflect / areflect
Generate a contextual answer by reasoning over stored memories.The memory bank to reason over.
The question or prompt.
Retrieval effort:
"low" (default), "mid", or "high".Extra context injected alongside the query.
JSON Schema for structured output. When provided,
answer.structured_output contains the parsed result.If
True, the response includes based_on listing the memories, mental models, and directives used. Default False.If
True, mental models are excluded from the reflection context. Default False.Bank management
Mental models
Mental models are pre-computed summaries derived from bank memories.Context manager
Low-level API access
For operations not covered by the convenience methods — documents, entities, webhooks, async operations, file uploads, and monitoring — use the typed sub-clients exposed as properties:await or asyncio.run().
| Property | API surface |
|---|---|
client.memory | Core memory operations |
client.banks | Bank management |
client.documents | Document CRUD |
client.entities | Entity browsing |
client.mental_models | Mental model management |
client.directives | Directive management |
client.operations | Async operation tracking |
client.webhooks | Webhook management |
client.files | File upload |
client.monitoring | Health, version, metrics |
