Most AI agents rely on simple vector search to retrieve context — embedding a query, finding the nearest stored text, and passing it to the model. This works for exact recall but breaks down as soon as queries require temporal reasoning (“What did Alice do last spring?”), indirect connections (“Who does Alice work with?”), or synthesized understanding (“What patterns have emerged from our support tickets?”). A vector index has no concept of time, relationships, or accumulated learning. Hindsight replaces that single index with a structured memory system built around how human memory actually works: facts are stored, related facts are linked into a graph, and repeated patterns are consolidated into durable beliefs. The result is a system that can answer questions a flat embedding store cannot.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/vectorize-io/hindsight/llms.txt
Use this file to discover all available pages before exploring further.
The four memory types
Hindsight organizes everything a bank knows into four distinct layers, each serving a different role in the retrieval and reasoning pipeline.| Type | What it stores | Example |
|---|---|---|
| World Facts | Objective facts about people, places, and things | ”Alice works at Google as a software engineer” |
| Experience Facts | The bank’s own actions and interactions | ”I recommended Python to Alice for her ML project” |
| Observations | Automatically consolidated patterns and beliefs | ”Alice is a Python-focused developer who values readability” |
| Mental Models | User-curated summaries for high-frequency queries | ”Team communication best practices and preferences” |
retain() lands here first. They preserve exact meaning, emotion, and context rather than fragmenting information into disconnected snippets.
Observations are the system’s synthesized understanding. After facts are retained, a background consolidation engine analyzes them for patterns and creates or refines observations automatically. An observation about Alice’s programming preferences, for example, is built from dozens of individual facts — and it evolves as new contradicting or supporting evidence arrives, preserving the full history of change.
Mental Models sit at the top of the hierarchy. They are explicitly curated by you (not auto-generated) and represent ground truth for questions you know will be asked frequently. During reflect(), they are checked before anything else.
How memories flow through the system
When you callretain(), the content is chunked, sent through LLM-based fact extraction, and stored as structured world or experience facts in the knowledge graph. Entity recognition fires in parallel, linking the new facts to existing entities (people, organizations, concepts) that are already in the bank.
Once retain() completes, the consolidation engine runs asynchronously. It compares the new facts against existing observations, creates new observations when patterns emerge, and refines existing ones with additional evidence. Your retain() call returns immediately; consolidation happens in the background.
When you call recall() or reflect(), the retrieval pipeline draws from all four memory types and returns them ranked by relevance.
TEMPR: four retrieval strategies in parallel
A single query reaches memories through four complementary strategies that run simultaneously, then their results are fused:| Strategy | What it finds |
|---|---|
| Temporal | Facts tied to specific times — “last spring”, date ranges, before/after relationships |
| Entity | Facts connected through the knowledge graph — indirect relationships, multi-hop traversal |
| Metric | Semantic meaning — conceptual matches, synonyms, paraphrasing |
| Pathway | Exact terms — proper nouns, technical identifiers, unique phrases (BM25) |
| Rank | RRF fusion + cross-encoder reranking — combines all four into a final scored list |
Observation consolidation in depth
Observations are not summaries the LLM invents. Each one is grounded in specific source memories, carries a proof count, and evolves rather than being overwritten.Deduplication
Deduplication
Instead of accumulating dozens of overlapping facts like “Alice prefers Python”, “Alice uses Python for everything”, and “Alice recommends Python to teammates”, the consolidation engine merges them into a single durable observation: “Alice is a Python-focused developer who values readability and simplicity.” The raw facts are preserved as evidence; the observation is what surfaces during retrieval.
Evidence tracking
Evidence tracking
Every observation references the source memory IDs (with exact quotes) that support it, plus a running proof count. An observation backed by 50 facts has higher confidence than one backed by 2. The scoring pipeline applies a small proof-count boost to well-evidenced observations during ranking.
Freshness trends
Freshness trends
Each observation carries a computed trend — stable, strengthening, weakening, new, or stale — based on when its supporting evidence arrived. During
reflect(), the agent automatically verifies stale observations against current raw facts before relying on them.Contradiction handling
Contradiction handling
When a new fact contradicts an existing observation, the engine reconciles rather than overwrites. The full evolution is preserved: “User was previously a React enthusiast who appreciated its component model, but has now switched to Vue and no longer uses React.” Your agent understands not just the current state but the deliberate journey that led to it.
Priority order during reflect
When thereflect() agent gathers evidence before generating a response, it follows a strict retrieval hierarchy:
Mental Models
User-curated summaries checked first. If a mental model exists for the query, it takes priority over everything else — it represents explicitly managed knowledge.
Observations
Consolidated, evidence-grounded beliefs with freshness awareness. Stale observations are verified against raw facts before use.
