Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vectorize-io/hindsight/llms.txt

Use this file to discover all available pages before exploring further.

Most AI agents rely on simple vector search to retrieve context — embedding a query, finding the nearest stored text, and passing it to the model. This works for exact recall but breaks down as soon as queries require temporal reasoning (“What did Alice do last spring?”), indirect connections (“Who does Alice work with?”), or synthesized understanding (“What patterns have emerged from our support tickets?”). A vector index has no concept of time, relationships, or accumulated learning. Hindsight replaces that single index with a structured memory system built around how human memory actually works: facts are stored, related facts are linked into a graph, and repeated patterns are consolidated into durable beliefs. The result is a system that can answer questions a flat embedding store cannot.

The four memory types

Hindsight organizes everything a bank knows into four distinct layers, each serving a different role in the retrieval and reasoning pipeline.
TypeWhat it storesExample
World FactsObjective facts about people, places, and things”Alice works at Google as a software engineer”
Experience FactsThe bank’s own actions and interactions”I recommended Python to Alice for her ML project”
ObservationsAutomatically consolidated patterns and beliefs”Alice is a Python-focused developer who values readability”
Mental ModelsUser-curated summaries for high-frequency queries”Team communication best practices and preferences”
World Facts and Experience Facts are the raw inputs — everything that flows through retain() lands here first. They preserve exact meaning, emotion, and context rather than fragmenting information into disconnected snippets. Observations are the system’s synthesized understanding. After facts are retained, a background consolidation engine analyzes them for patterns and creates or refines observations automatically. An observation about Alice’s programming preferences, for example, is built from dozens of individual facts — and it evolves as new contradicting or supporting evidence arrives, preserving the full history of change. Mental Models sit at the top of the hierarchy. They are explicitly curated by you (not auto-generated) and represent ground truth for questions you know will be asked frequently. During reflect(), they are checked before anything else.

How memories flow through the system

When you call retain(), the content is chunked, sent through LLM-based fact extraction, and stored as structured world or experience facts in the knowledge graph. Entity recognition fires in parallel, linking the new facts to existing entities (people, organizations, concepts) that are already in the bank. Once retain() completes, the consolidation engine runs asynchronously. It compares the new facts against existing observations, creates new observations when patterns emerge, and refines existing ones with additional evidence. Your retain() call returns immediately; consolidation happens in the background. When you call recall() or reflect(), the retrieval pipeline draws from all four memory types and returns them ranked by relevance.

TEMPR: four retrieval strategies in parallel

A single query reaches memories through four complementary strategies that run simultaneously, then their results are fused:
StrategyWhat it finds
TemporalFacts tied to specific times — “last spring”, date ranges, before/after relationships
EntityFacts connected through the knowledge graph — indirect relationships, multi-hop traversal
MetricSemantic meaning — conceptual matches, synonyms, paraphrasing
PathwayExact terms — proper nouns, technical identifiers, unique phrases (BM25)
RankRRF fusion + cross-encoder reranking — combines all four into a final scored list
The fusion step uses Reciprocal Rank Fusion (RRF): memories appearing across multiple strategies rank higher than those found by only one. A cross-encoder then reranks the top candidates by evaluating each query-memory pair directly, catching nuances that position-based fusion misses.

Observation consolidation in depth

Observations are not summaries the LLM invents. Each one is grounded in specific source memories, carries a proof count, and evolves rather than being overwritten.
Instead of accumulating dozens of overlapping facts like “Alice prefers Python”, “Alice uses Python for everything”, and “Alice recommends Python to teammates”, the consolidation engine merges them into a single durable observation: “Alice is a Python-focused developer who values readability and simplicity.” The raw facts are preserved as evidence; the observation is what surfaces during retrieval.
Every observation references the source memory IDs (with exact quotes) that support it, plus a running proof count. An observation backed by 50 facts has higher confidence than one backed by 2. The scoring pipeline applies a small proof-count boost to well-evidenced observations during ranking.
When a new fact contradicts an existing observation, the engine reconciles rather than overwrites. The full evolution is preserved: “User was previously a React enthusiast who appreciated its component model, but has now switched to Vue and no longer uses React.” Your agent understands not just the current state but the deliberate journey that led to it.

Priority order during reflect

When the reflect() agent gathers evidence before generating a response, it follows a strict retrieval hierarchy:
1

Mental Models

User-curated summaries checked first. If a mental model exists for the query, it takes priority over everything else — it represents explicitly managed knowledge.
2

Observations

Consolidated, evidence-grounded beliefs with freshness awareness. Stale observations are verified against raw facts before use.
3

Raw Facts

World facts and experience facts. Used as ground truth and as the fallback when observations don’t cover the query.
This hierarchy means the agent produces consistent, synthesized responses rather than simply echoing back the nearest matching text.

Build docs developers (and LLMs) love