When you callDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/vectorize-io/hindsight/llms.txt
Use this file to discover all available pages before exploring further.
recall(), Hindsight runs four retrieval strategies simultaneously against your memory bank, merges their results with Reciprocal Rank Fusion (RRF), and reranks the top candidates with a cross-encoder that evaluates each query-memory pair directly. The result is a ranked list of memories trimmed to fit your token budget — ready to pass directly to an LLM.
Four retrieval strategies
No single search method handles all the ways you might query a memory bank. Hindsight runs all four strategies in parallel and combines their results.| Strategy | What it finds | Best for |
|---|---|---|
| Semantic | Conceptual matches by meaning | ”Alice’s job” → “Alice works as a software engineer” |
| Keyword (BM25) | Exact terms and proper nouns | ”Google”, “Alice Chen”, “PostgreSQL” |
| Graph | Entities connected through the knowledge graph | Indirect relationships, multi-hop traversal |
| Temporal | Facts tied to specific times | ”last spring”, date ranges, before/after queries |
Semantic search
The semantic strategy embeds the query and finds memories with similar meaning, even when the exact words differ. It handles paraphrasing, synonyms, and conceptual questions: “Bob’s expertise” matches “Bob specializes in machine learning.”Keyword search (BM25)
The keyword strategy uses BM25 full-text search to find memories that contain specific terms. It is essential for proper nouns, technical identifiers, and unique phrases that semantic search might miss because they are lexically distinct from the query.Graph traversal
The graph strategy follows entity and causal connections to surface memories that are structurally related to the query rather than textually similar. It can traverse multiple hops: a query about Alice can surface her manager’s decisions by following Alice → team → manager → decisions. Graph scoring combines three signals additively for each candidate:| Signal | What it rewards |
|---|---|
| Entity overlap | Shared named entities between query and memory |
| Semantic link | Pre-computed similarity links in the knowledge graph |
| Causal link | Explicit cause-effect relationships |
Temporal search
The temporal strategy parses time expressions in the query (“last spring”, “in 2023”, “before Alice joined Google”) and filters or boosts memories based on when they occurred. It combines semantic understanding with date filtering so historical queries remain precise.Fusion and reranking
RRF fusion
Results from all four strategies are merged using Reciprocal Rank Fusion. Each memory’s score is computed as the sum of
1 / (60 + rank) across all strategies where it appears. Memories that rank well in multiple strategies score higher — the system rewards consensus without needing scores to be on a comparable scale.Cross-encoder reranking
The top 300 candidates by RRF score are reranked by a cross-encoder that evaluates each query-memory pair together. This catches nuances that rank-based fusion misses — for example, a memory that ranked first in keyword search because it matched a common term but is actually irrelevant to the query’s intent.
Scoring boosts
The normalized cross-encoder score is multiplied by three small boosts: recency (memories from the last year score higher), temporal proximity (for time-specific queries), and proof count (observations backed by more evidence score slightly higher). Boosts are capped at ±10% so they nudge rankings without overriding relevance.
Parameters
The question or search query. Can be a natural language question, a keyword, a time expression, or any combination.
Search depth. Controls how many candidates each strategy considers, how deep graph traversal goes, and how many candidates the cross-encoder reranks.
| Value | Recall budget | Best for |
|---|---|---|
low | 100 | Fast chatbot responses, simple lookups |
mid | 300 | Most queries — balanced coverage and speed |
high | 1000 | Complex multi-hop queries, research tasks |
Token budget for returned memory content. The pipeline fills this budget with the highest-scoring memories.
| Value | Approx. pages | Best for |
|---|---|---|
2048 | ~2 pages | Focused answers, fast downstream LLM |
4096 | ~4 pages | Balanced context |
8192 | ~8 pages | Comprehensive summaries |
Filter by memory type. Accepts any combination of
world, experience, observation. Omit to return all types.Filter memories by metadata key-value pairs set during
retain(). Only memories matching all specified fields are returned.Filter memories by visibility tags. Combined with
tags_match to control whether all tags must match or any tag suffices.Return the raw source text that generated each memory alongside the distilled fact. Useful when verbatim quotes or additional nuance are needed.
Token budget for chunk content when
include_chunks is true. Applied independently of max_tokens.Code examples
Response structure
The recall response includes the ranked memories and token usage metadata:budget and max_tokens are independent controls. budget determines how thoroughly the bank is searched; max_tokens determines how much content is returned. A high budget with low max tokens means deep search returning only the best matches. A low budget with high max tokens means fast search returning everything found.