Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vectorize-io/hindsight/llms.txt

Use this file to discover all available pages before exploring further.

When you call retain(), Hindsight doesn’t simply store the text you pass it. It reads the content, extracts structured facts, recognizes entities, builds connections between them, and writes the result into your memory bank as a searchable knowledge graph. The raw content becomes memories that can be retrieved by meaning, keyword, entity, or time.

What happens during retain

1

Chunk the content

Hindsight splits the input into chunks (default 3,000 characters) so large documents are processed in parallel. Each chunk goes through the same extraction pipeline independently.
2

Extract facts with an LLM

For each chunk, an LLM extracts structured facts — including not just what was said, but why, how, and what it means emotionally or causally. A sentence like “Alice joined Google last spring and was thrilled about the research opportunities” yields facts about the event, the date, her emotional state, and her motivation.
3

Recognize and resolve entities

Named entities (people, organizations, places, concepts) are identified and resolved across the bank. “Alice”, “Alice Chen”, and “Alice C.” are unified into one entity. Co-occurrence patterns disambiguate common names.
4

Build graph connections

Facts are linked by entity (all Alice facts), time (facts close in date), semantics (thematically related content), and causality (cause-effect pairs). These links power graph traversal during recall.
5

Trigger observation consolidation

Once retain completes, the consolidation engine runs asynchronously in the background. It compares new facts against existing observations and refines or creates consolidated beliefs. Your retain() call returns before this finishes.

Parameters

content
string
required
The text to retain — a conversation turn, document, transcript, or any unstructured content.
timestamp
string
ISO 8601 datetime for when the content occurred. Used for temporal retrieval (“What happened last spring?”) and recency ranking. Defaults to the current time if omitted.
context
string
Additional framing passed to the extraction LLM but not stored as a memory itself. Use this to provide background the LLM needs to extract facts correctly — for example, the name of the user whose conversation you are retaining.
metadata
object
Arbitrary key-value pairs stored alongside each memory. Not used during extraction or retrieval, but returned in recall results and useful for downstream filtering.
document_id
string
Groups all memories produced from this retain call under a single document ID. Useful for batch operations and for later deleting all memories from a specific source document.
retain_async
boolean
default:"false"
When true, retain returns immediately without waiting for extraction to complete. Use for background ingestion where confirmation of storage is not needed before proceeding.
tags
string[]
Tags applied to every memory produced by this call. Use tags to scope visibility — for example, tagging memories with a user ID so each user only retrieves their own memories during recall.

Code examples

from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")

# Basic retain
result = client.retain(
    bank_id="my-agent",
    content="Alice mentioned she prefers Python over JavaScript, mainly because of its data science ecosystem.",
    timestamp="2024-03-15T10:30:00Z",
)

# Retain with context and tags
result = client.retain(
    bank_id="my-agent",
    content="User: I'm planning to switch to TypeScript for my next project. Assistant: That sounds like a solid choice given your team's background.",
    context="This is a conversation with user alice-123.",
    tags=["user:alice-123"],
)
print(result.document_id)

Batch retain

When you have multiple documents to ingest at once, pass a list of retain requests. Hindsight processes them in parallel and returns a list of results in the same order.
results = client.retain_batch(
    bank_id="my-agent",
    items=[
        {
            "content": "Alice joined Google last spring.",
            "timestamp": "2024-04-01T09:00:00Z",
            "document_id": "conv-001",
        },
        {
            "content": "Bob started his ML research project at MIT.",
            "timestamp": "2024-04-02T14:00:00Z",
            "document_id": "conv-002",
        },
    ],
)

Steering extraction

By default, retain() extracts all significant facts from the content. You can narrow this focus using a retain_mission on the memory bank — a plain-language description of what the bank should pay attention to.
e.g. Always include technical decisions, API design choices, and architectural trade-offs.
     Ignore meeting logistics, greetings, and social exchanges.
You can also change the extraction mode:
ModeWhen to use
concise (default)General-purpose — selective, fast
verboseWhen you need richer facts with full context and relationships
customWhen you want to write your own extraction rules entirely
Set retain_mission and retain_extraction_mode via the bank config API or the HINDSIGHT_API_RETAIN_MISSION environment variable.
Observation consolidation runs automatically after every retain() call. It runs in the background — your retain() call returns before it completes. See how Hindsight reflects on memories for how observations influence reflect() responses.

Build docs developers (and LLMs) love