Retain — LLM-powered fact extraction and storage

When you call retain(), Hindsight doesn’t simply store the text you pass it. It reads the content, extracts structured facts, recognizes entities, builds connections between them, and writes the result into your memory bank as a searchable knowledge graph. The raw content becomes memories that can be retrieved by meaning, keyword, entity, or time.

What happens during retain

Chunk the content

Hindsight splits the input into chunks (default 3,000 characters) so large documents are processed in parallel. Each chunk goes through the same extraction pipeline independently.

Extract facts with an LLM

For each chunk, an LLM extracts structured facts — including not just what was said, but why, how, and what it means emotionally or causally. A sentence like “Alice joined Google last spring and was thrilled about the research opportunities” yields facts about the event, the date, her emotional state, and her motivation.

Recognize and resolve entities

Named entities (people, organizations, places, concepts) are identified and resolved across the bank. “Alice”, “Alice Chen”, and “Alice C.” are unified into one entity. Co-occurrence patterns disambiguate common names.

Build graph connections

Facts are linked by entity (all Alice facts), time (facts close in date), semantics (thematically related content), and causality (cause-effect pairs). These links power graph traversal during recall.

Trigger observation consolidation

Once retain completes, the consolidation engine runs asynchronously in the background. It compares new facts against existing observations and refines or creates consolidated beliefs. Your retain() call returns before this finishes.

Parameters

content

string

required

The text to retain — a conversation turn, document, transcript, or any unstructured content.

timestamp

string

ISO 8601 datetime for when the content occurred. Used for temporal retrieval (“What happened last spring?”) and recency ranking. Defaults to the current time if omitted.

context

string

Additional framing passed to the extraction LLM but not stored as a memory itself. Use this to provide background the LLM needs to extract facts correctly — for example, the name of the user whose conversation you are retaining.

metadata

object

Arbitrary key-value pairs stored alongside each memory. Not used during extraction or retrieval, but returned in recall results and useful for downstream filtering.

document_id

string

Groups all memories produced from this retain call under a single document ID. Useful for batch operations and for later deleting all memories from a specific source document.

retain_async

boolean

default:"false"

When true, retain returns immediately without waiting for extraction to complete. Use for background ingestion where confirmation of storage is not needed before proceeding.

Code examples

from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")

# Basic retain
result = client.retain(
    bank_id="my-agent",
    content="Alice mentioned she prefers Python over JavaScript, mainly because of its data science ecosystem.",
    timestamp="2024-03-15T10:30:00Z",
)

# Retain with context and tags
result = client.retain(
    bank_id="my-agent",
    content="User: I'm planning to switch to TypeScript for my next project. Assistant: That sounds like a solid choice given your team's background.",
    context="This is a conversation with user alice-123.",
    tags=["user:alice-123"],
)
print(result.document_id)

Batch retain

When you have multiple documents to ingest at once, pass a list of retain requests. Hindsight processes them in parallel and returns a list of results in the same order.

results = client.retain_batch(
    bank_id="my-agent",
    items=[
        {
            "content": "Alice joined Google last spring.",
            "timestamp": "2024-04-01T09:00:00Z",
            "document_id": "conv-001",
        },
        {
            "content": "Bob started his ML research project at MIT.",
            "timestamp": "2024-04-02T14:00:00Z",
            "document_id": "conv-002",
        },
    ],
)

Steering extraction

By default, retain() extracts all significant facts from the content. You can narrow this focus using a retain_mission on the memory bank — a plain-language description of what the bank should pay attention to.

e.g. Always include technical decisions, API design choices, and architectural trade-offs.
     Ignore meeting logistics, greetings, and social exchanges.

You can also change the extraction mode:

Mode	When to use
`concise` (default)	General-purpose — selective, fast
`verbose`	When you need richer facts with full context and relationships
`custom`	When you want to write your own extraction rules entirely

Set retain_mission and retain_extraction_mode via the bank config API or the HINDSIGHT_API_RETAIN_MISSION environment variable.

Observation consolidation runs automatically after every retain() call. It runs in the background — your retain() call returns before it completes. See how Hindsight reflects on memories for how observations influence reflect() responses.

Get Started

Core Concepts

SDKs & Clients

Integrations

Deployment & Operations

Retain — LLM-powered fact extraction and storage

What happens during retain

Parameters

Code examples

Batch retain

Steering extraction

Build docs developers (and LLMs) love

Get Started

Core Concepts

SDKs & Clients

Integrations

Deployment & Operations

Documentation Index

​What happens during retain

​Parameters

​Code examples

​Batch retain

​Steering extraction

Build docs developers (and LLMs) love

What happens during retain

Parameters

Code examples

Batch retain

Steering extraction