Memory architecture — how Hindsight stores knowledge

Most AI agents rely on simple vector search to retrieve context — embedding a query, finding the nearest stored text, and passing it to the model. This works for exact recall but breaks down as soon as queries require temporal reasoning (“What did Alice do last spring?”), indirect connections (“Who does Alice work with?”), or synthesized understanding (“What patterns have emerged from our support tickets?”). A vector index has no concept of time, relationships, or accumulated learning. Hindsight replaces that single index with a structured memory system built around how human memory actually works: facts are stored, related facts are linked into a graph, and repeated patterns are consolidated into durable beliefs. The result is a system that can answer questions a flat embedding store cannot.

The four memory types

Hindsight organizes everything a bank knows into four distinct layers, each serving a different role in the retrieval and reasoning pipeline.

Type	What it stores	Example
World Facts	Objective facts about people, places, and things	”Alice works at Google as a software engineer”
Experience Facts	The bank’s own actions and interactions	”I recommended Python to Alice for her ML project”
Observations	Automatically consolidated patterns and beliefs	”Alice is a Python-focused developer who values readability”
Mental Models	User-curated summaries for high-frequency queries	”Team communication best practices and preferences”

World Facts and Experience Facts are the raw inputs — everything that flows through retain() lands here first. They preserve exact meaning, emotion, and context rather than fragmenting information into disconnected snippets. Observations are the system’s synthesized understanding. After facts are retained, a background consolidation engine analyzes them for patterns and creates or refines observations automatically. An observation about Alice’s programming preferences, for example, is built from dozens of individual facts — and it evolves as new contradicting or supporting evidence arrives, preserving the full history of change. Mental Models sit at the top of the hierarchy. They are explicitly curated by you (not auto-generated) and represent ground truth for questions you know will be asked frequently. During reflect(), they are checked before anything else.

How memories flow through the system

When you call retain(), the content is chunked, sent through LLM-based fact extraction, and stored as structured world or experience facts in the knowledge graph. Entity recognition fires in parallel, linking the new facts to existing entities (people, organizations, concepts) that are already in the bank. Once retain() completes, the consolidation engine runs asynchronously. It compares the new facts against existing observations, creates new observations when patterns emerge, and refines existing ones with additional evidence. Your retain() call returns immediately; consolidation happens in the background. When you call recall() or reflect(), the retrieval pipeline draws from all four memory types and returns them ranked by relevance.

TEMPR: four retrieval strategies in parallel

A single query reaches memories through four complementary strategies that run simultaneously, then their results are fused:

Strategy	What it finds
Temporal	Facts tied to specific times — “last spring”, date ranges, before/after relationships
Entity	Facts connected through the knowledge graph — indirect relationships, multi-hop traversal
Metric	Semantic meaning — conceptual matches, synonyms, paraphrasing
Pathway	Exact terms — proper nouns, technical identifiers, unique phrases (BM25)
Rank	RRF fusion + cross-encoder reranking — combines all four into a final scored list

The fusion step uses Reciprocal Rank Fusion (RRF): memories appearing across multiple strategies rank higher than those found by only one. A cross-encoder then reranks the top candidates by evaluating each query-memory pair directly, catching nuances that position-based fusion misses.

Observation consolidation in depth

Observations are not summaries the LLM invents. Each one is grounded in specific source memories, carries a proof count, and evolves rather than being overwritten.

Deduplication

Instead of accumulating dozens of overlapping facts like “Alice prefers Python”, “Alice uses Python for everything”, and “Alice recommends Python to teammates”, the consolidation engine merges them into a single durable observation: “Alice is a Python-focused developer who values readability and simplicity.” The raw facts are preserved as evidence; the observation is what surfaces during retrieval.

Evidence tracking

Every observation references the source memory IDs (with exact quotes) that support it, plus a running proof count. An observation backed by 50 facts has higher confidence than one backed by 2. The scoring pipeline applies a small proof-count boost to well-evidenced observations during ranking.

Freshness trends

Each observation carries a computed trend — stable, strengthening, weakening, new, or stale — based on when its supporting evidence arrived. During reflect(), the agent automatically verifies stale observations against current raw facts before relying on them.

Contradiction handling

When a new fact contradicts an existing observation, the engine reconciles rather than overwrites. The full evolution is preserved: “User was previously a React enthusiast who appreciated its component model, but has now switched to Vue and no longer uses React.” Your agent understands not just the current state but the deliberate journey that led to it.

Priority order during reflect

When the reflect() agent gathers evidence before generating a response, it follows a strict retrieval hierarchy:

Mental Models

User-curated summaries checked first. If a mental model exists for the query, it takes priority over everything else — it represents explicitly managed knowledge.

Observations

Consolidated, evidence-grounded beliefs with freshness awareness. Stale observations are verified against raw facts before use.

Raw Facts

World facts and experience facts. Used as ground truth and as the fallback when observations don’t cover the query.

This hierarchy means the agent produces consistent, synthesized responses rather than simply echoing back the nearest matching text.

Get Started

Core Concepts

SDKs & Clients

Integrations

Deployment & Operations

Memory architecture — how Hindsight stores knowledge

The four memory types

How memories flow through the system

TEMPR: four retrieval strategies in parallel

Observation consolidation in depth

Priority order during reflect

Build docs developers (and LLMs) love

Get Started

Core Concepts

SDKs & Clients

Integrations

Deployment & Operations

Documentation Index

​The four memory types

​How memories flow through the system

​TEMPR: four retrieval strategies in parallel

​Observation consolidation in depth

​Priority order during reflect

Build docs developers (and LLMs) love

The four memory types

How memories flow through the system

TEMPR: four retrieval strategies in parallel

Observation consolidation in depth

Priority order during reflect