Recall — TEMPR multi-strategy memory retrieval

When you call recall(), Hindsight runs four retrieval strategies simultaneously against your memory bank, merges their results with Reciprocal Rank Fusion (RRF), and reranks the top candidates with a cross-encoder that evaluates each query-memory pair directly. The result is a ranked list of memories trimmed to fit your token budget — ready to pass directly to an LLM.

Four retrieval strategies

No single search method handles all the ways you might query a memory bank. Hindsight runs all four strategies in parallel and combines their results.

Strategy	What it finds	Best for
Semantic	Conceptual matches by meaning	”Alice’s job” → “Alice works as a software engineer”
Keyword (BM25)	Exact terms and proper nouns	”Google”, “Alice Chen”, “PostgreSQL”
Graph	Entities connected through the knowledge graph	Indirect relationships, multi-hop traversal
Temporal	Facts tied to specific times	”last spring”, date ranges, before/after queries

Semantic search

The semantic strategy embeds the query and finds memories with similar meaning, even when the exact words differ. It handles paraphrasing, synonyms, and conceptual questions: “Bob’s expertise” matches “Bob specializes in machine learning.”

Keyword search (BM25)

The keyword strategy uses BM25 full-text search to find memories that contain specific terms. It is essential for proper nouns, technical identifiers, and unique phrases that semantic search might miss because they are lexically distinct from the query.

Graph traversal

The graph strategy follows entity and causal connections to surface memories that are structurally related to the query rather than textually similar. It can traverse multiple hops: a query about Alice can surface her manager’s decisions by following Alice → team → manager → decisions. Graph scoring combines three signals additively for each candidate:

Signal	What it rewards
Entity overlap	Shared named entities between query and memory
Semantic link	Pre-computed similarity links in the knowledge graph
Causal link	Explicit cause-effect relationships

Temporal search

The temporal strategy parses time expressions in the query (“last spring”, “in 2023”, “before Alice joined Google”) and filters or boosts memories based on when they occurred. It combines semantic understanding with date filtering so historical queries remain precise.

Fusion and reranking

RRF fusion

Results from all four strategies are merged using Reciprocal Rank Fusion. Each memory’s score is computed as the sum of 1 / (60 + rank) across all strategies where it appears. Memories that rank well in multiple strategies score higher — the system rewards consensus without needing scores to be on a comparable scale.

Cross-encoder reranking

The top 300 candidates by RRF score are reranked by a cross-encoder that evaluates each query-memory pair together. This catches nuances that rank-based fusion misses — for example, a memory that ranked first in keyword search because it matched a common term but is actually irrelevant to the query’s intent.

Scoring boosts

The normalized cross-encoder score is multiplied by three small boosts: recency (memories from the last year score higher), temporal proximity (for time-specific queries), and proof count (observations backed by more evidence score slightly higher). Boosts are capped at ±10% so they nudge rankings without overriding relevance.

Token truncation

Results are sorted by final score and selected top-down until the max_tokens budget is exhausted. Only memory text counts toward the budget — metadata is free.

Parameters

query

string

required

The question or search query. Can be a natural language question, a keyword, a time expression, or any combination.

budget

string

default:"mid"

Search depth. Controls how many candidates each strategy considers, how deep graph traversal goes, and how many candidates the cross-encoder reranks.

Value	Recall budget	Best for
`low`	100	Fast chatbot responses, simple lookups
`mid`	300	Most queries — balanced coverage and speed
`high`	1000	Complex multi-hop queries, research tasks

max_tokens

number

default:"4096"

Token budget for returned memory content. The pipeline fills this budget with the highest-scoring memories.

Value	Approx. pages	Best for
`2048`	~2 pages	Focused answers, fast downstream LLM
`4096`	~4 pages	Balanced context
`8192`	~8 pages	Comprehensive summaries

types

string[]

Filter by memory type. Accepts any combination of world, experience, observation. Omit to return all types.

metadata_filter

object

Filter memories by metadata key-value pairs set during retain(). Only memories matching all specified fields are returned.

Code examples

from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")

# Basic recall
result = client.recall(
    bank_id="my-agent",
    query="What programming languages does Alice prefer?",
)
for memory in result.memories:
    print(memory.text, memory.type)

# Recall with budget and token control
result = client.recall(
    bank_id="my-agent",
    query="What happened during Alice's onboarding last spring?",
    budget="high",
    max_tokens=8192,
    types=["world", "experience"],
)

# Recall with tag filtering
result = client.recall(
    bank_id="my-agent",
    query="What are this user's preferences?",
    tags=["user:alice-123"],
    tags_match="all",
    max_tokens=4096,
)

# Recall with source chunks
result = client.recall(
    bank_id="my-agent",
    query="What exactly did Alice say about the API design?",
    include_chunks=True,
    max_chunk_tokens=2048,
)
for memory in result.memories:
    print(memory.text)
    if memory.chunk:
        print("Source:", memory.chunk.text)

Response structure

The recall response includes the ranked memories and token usage metadata:

{
  "memories": [
    {
      "id": "mem-123",
      "text": "Alice prefers Python over JavaScript for data science work",
      "type": "world",
      "score": 0.94,
      "occurred_at": "2024-03-15T10:30:00Z",
      "tags": ["user:alice-123"]
    },
    {
      "id": "obs-456",
      "text": "Alice is a Python-focused developer who values readability and simplicity",
      "type": "observation",
      "score": 0.91,
      "proof_count": 12
    }
  ],
  "usage": {
    "memory_tokens": 312,
    "total_tokens": 312
  }
}

budget and max_tokens are independent controls. budget determines how thoroughly the bank is searched; max_tokens determines how much content is returned. A high budget with low max tokens means deep search returning only the best matches. A low budget with high max tokens means fast search returning everything found.

Get Started

Core Concepts

SDKs & Clients

Integrations

Deployment & Operations

Recall — TEMPR multi-strategy memory retrieval

Four retrieval strategies

Semantic search

Keyword search (BM25)

Graph traversal

Temporal search

Fusion and reranking

Parameters

Code examples

Response structure

Build docs developers (and LLMs) love

Get Started

Core Concepts

SDKs & Clients

Integrations

Deployment & Operations

Documentation Index

​Four retrieval strategies

​Semantic search

​Keyword search (BM25)

​Graph traversal

​Temporal search

​Fusion and reranking

​Parameters

​Code examples

​Response structure

Build docs developers (and LLMs) love

Four retrieval strategies

Semantic search

Keyword search (BM25)

Graph traversal

Temporal search

Fusion and reranking

Parameters

Code examples

Response structure