Retrieval Strategies

REMem’s retrieval strategies combine dense passage retrieval with graph-based exploration to find relevant context for answering questions. Different extraction methods use different retrieval strategies.

Retrieval Philosophy

Traditional RAG uses dense retrieval alone:

Embed the query
Find k-nearest passages by cosine similarity
Return top-k

REMem enhances this with graph navigation:

Seed selection: Dense retrieval finds initial facts/gists
Graph exploration: Navigate edges to related entities and passages
Ranking fusion: Combine dense scores with graph signals
Passage scoring: Personalized PageRank ranks final passages

This enables multi-hop reasoning that pure dense retrieval misses.

Strategy Architecture

Each extraction method has a corresponding strategy (from rag_strategies/factory.py):

class RAGStrategyFactory:
    @staticmethod
    def create_strategy(extract_method: str, remem_instance):
        if extract_method == "openie":
            return DefaultRAGStrategy(remem_instance)
        elif extract_method in ["episodic_gist"]:
            return EpisodicGistStrategy(remem_instance)
        elif extract_method == "temporal":
            return TemporalStrategy(remem_instance)
        # ...

All strategies inherit from RAGStrategy (base_strategy.py) and implement:

index(): Build the graph
retrieve_each_query(): Retrieve for a single query
rag_for_qa(): Full RAG pipeline (retrieve + answer)

Default Strategy (OpenIE)

The default strategy for openie extraction combines fact retrieval with graph search.

Retrieval Pipeline

Step 1: Query-to-Fact Matching (remem.py:525) Embed the query and find similar facts:

query_triple_scores = self.query_to_triple_scores(query)
# Returns scores for all facts based on embedding similarity

Step 2: Fact Reranking (remem.py:526) Optionally rerank facts using a trained filter:

top_k_triple_indices, top_k_triples, rerank_log = self.rank_triples(query, query_triple_scores)

If no relevant facts are found after reranking:

if len(top_k_triples) == 0:
    logger.info("No triple found after reranking, return DPR results")
    sorted_chunk_ids, sorted_chunk_scores = self.dense_passage_retrieval(query)

Step 3: Graph Search (remem.py:531-538) Navigate from facts to entities to passages:

sorted_chunk_ids, sorted_chunk_scores = self.graph_search_with_fact_entities(
    query=query,
    link_top_k=self.global_config.linking_top_k,
    query_triple_scores=query_triple_scores,
    top_k_triples=top_k_triples,
    top_k_triple_indices=top_k_triple_indices,
    passage_node_weight=self.global_config.passage_node_weight,
)

Graph Search Algorithm

The graph search uses Personalized PageRank to rank passages:

Build seed set: Top-k facts + their entities
Initialize PPR: Set seed weights based on query similarity
Propagate: Random walk with damping through graph edges
Extract passages: Collect passage nodes and their scores
Normalize: Adjust passage scores by passage_node_weight

Key parameters:

linking_top_k=5: How many neighbors to explore per node
damping=0.5: PPR damping factor (how much weight stays at seed nodes)
passage_node_weight=0.05: Multiplicative factor for passage scores

Example Trace

Query: “Who proposed the test that Turing created?”

1. Query-to-fact matching:
   Top fact: (Alan Turing, proposed, Turing Test) [score: 0.92]

2. Graph exploration:
   Fact → Entity "Alan Turing" → Entity "Turing Test"
                ↓                          ↓
           Passage 1 [0.85]           Passage 2 [0.78]

3. Passage ranking:
   Passage 1: "Alan Turing proposed the Turing Test in 1950." [final: 0.89]
   Passage 2: "The Turing Test is a measure of machine intelligence." [final: 0.74]

Episodic Gist Strategy

For episodic_gist extraction, the strategy retrieves through gists and verbatim nodes.

Key Differences from Default

Gist-based seeding: Initial retrieval uses gist summaries instead of facts
Multi-level exploration: Navigate through verbatim → gist → fact → entity
Agent-based QA: Uses tool-augmented reasoning for answer generation

Retrieval Pipeline

The episodic gist strategy delegates to an agent-based approach:

# From episodic_gist_strategy.py:875-877
sorted_chunk_ids, sorted_chunk_scores, agent_result = self._rag_each_query(
    remem, query, return_chunk, gold_answer=current_gold_answer, question_metadata=question_metadata_item
)

The agent can use different retrieval tools:

semantic_retrieve: Dense search over gists or verbatim
lexical_retrieve: BM25 search
fact_retrieve: Search over structured facts

Agent Configuration

Two modes for agent-based retrieval: Fixed tools (config: agent_fixed_tools=True):

config = BaseConfig(
    extract_method="episodic_gist",
    agent_fixed_tools=True,
    agent_max_steps=2,  # 1=retrieve only, 2=retrieve+answer
    agent_fixed_retrieval_tool="semantic_retrieve",
)

Agent always uses the specified retrieval tool, then outputs answer. Flexible tools (config: agent_fixed_tools=False):

config = BaseConfig(
    extract_method="episodic_gist",
    agent_fixed_tools=False,
    agent_max_steps=5,  # Up to 5 reasoning steps
)

Agent chooses which tools to use at each step based on the question.

Return Chunk Type

You can retrieve different node types:

# Return verbatim (original text with metadata)
query_solutions, _, _, _, _ = rag.rag_for_qa(
    queries=["What did the user ask about?"],
    return_chunk="verbatim",
)

# Return gists (compressed summaries)
query_solutions, _, _, _, _ = rag.rag_for_qa(
    queries=["What did the user ask about?"],
    return_chunk="gists",
)

From episodic_gist_strategy.py:880-918:

if return_chunk == "verbatim":
    hash_ids_to_fetch = [remem.entry_keys["verbatim"][idx] for idx in limited_chunk_ids]
    chunk_rows = remem.episodic_embedding_stores["verbatim"].get_rows(hash_ids_to_fetch)
    top_k_chunks_content = [row["content"] for row in chunk_rows.values()]
    top_k_chunks_metadata = [row.get("metadata", None) for row in chunk_rows.values()]
elif return_chunk == "gists":
    hash_ids_to_fetch = [remem.entry_keys["gists"][idx] for idx in limited_chunk_ids]
    chunk_rows = remem.episodic_embedding_stores["gists"].get_rows(hash_ids_to_fetch)
    top_k_chunks_content = [row["content"] for row in chunk_rows.values()]

When to use each:

verbatim: When you need exact quotes, speaker roles, timestamps
gists: When you need compressed context, faster reading for LLM

Parallel Processing

Episodic gist supports parallel query processing:

query_solutions, _, _, _, _ = rag.rag_for_qa(
    queries=queries,
    parallel=True,
    max_workers=8,  # Process 8 queries at once
)

From episodic_gist_strategy.py:653-694:

if parallel:
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_idx = {executor.submit(self._process_single_query, args): args[0] for args in args_list}
        for future in as_completed(future_to_idx):
            q_idx, query_solution, agent_result_dict, agent_answer = future.result()
            # ...

Temporal Strategy

For temporal extraction, the strategy emphasizes temporal reasoning:

Temporal fact retrieval: Facts with time qualifiers are prioritized
Chronological ordering: Results can be sorted by time
Temporal graph edges: Navigate through time-connected events

Configuration Parameters

Control retrieval behavior with these config options:

config = BaseConfig(
    # Retrieval
    retrieval_top_k=200,  # How many passages to retrieve
    linking_top_k=5,  # How many neighbors to explore per node
    damping=0.5,  # PageRank damping factor
    
    # Ranking
    passage_node_weight=0.05,  # Weight for passage nodes in PPR
    
    # QA
    qa_top_k=5,  # How many passages to give to the LLM for answer generation
    qa_passage_prefix="Wikipedia Title: ",  # Prefix for passages in QA prompt
    
    # Agent (for episodic_gist)
    agent_fixed_tools=False,  # Use fixed tools or flexible tool selection?
    agent_max_steps=5,  # Max reasoning steps
    agent_fixed_retrieval_tool="semantic_retrieve",  # Which retrieval tool for fixed mode
)

Retrieval + QA Pipeline

The full RAG pipeline combines retrieval with answer generation:

solutions, responses, meta, retrieval_metrics, qa_metrics = rag.rag_for_qa(
    queries=["Who proposed the Turing Test?"],
    gold_docs=[["passage_123"]],  # For retrieval evaluation
    gold_answers=[["Alan Turing"]],  # For QA evaluation
    metrics=("qa_em", "qa_f1", "retrieval_recall"),
)

Pipeline steps:

Retrieval (if not using pre-retrieved QuerySolution objects):

query_solutions = self.remem.retrieve(queries=queries)

Retrieval evaluation (if gold_docs provided):

overall_retrieval_metrics = self.remem.evaluate_retrieval(gold_docs, query_solutions, retrieval_evaluators)

Answer generation:

query_solutions, all_response_message, all_metadata = self.remem.qa(query_solutions)

QA evaluation (if gold_answers provided):

overall_qa_metrics = self.remem.evaluate_qa(gold_answers, qa_evaluators, query_solutions, question_metadata)

Save results:

self.remem.save_rag_results(gold_answers, gold_docs, query_solutions, overall_qa_metrics, overall_retrieval_metrics)

Per-Sample Evaluation

For episodic gist, you can evaluate each sample as it’s processed:

query_solutions, _, _, _, qa_metrics = rag.rag_for_qa(
    queries=queries,
    gold_answers=gold_answers,
    evaluate_per_sample=True,  # Evaluate each query as it completes
    save_per_sample=True,  # Save each result individually
)

This enables real-time monitoring:

📊 Sample 0: qa_em: 1.0000, qa_f1: 1.0000 | Avg: qa_em: 1.0000, qa_f1: 1.0000 | Total: 1
📊 Sample 1: qa_em: 0.0000, qa_f1: 0.6667 | Avg: qa_em: 0.5000, qa_f1: 0.8333 | Total: 2

Dense Passage Retrieval Fallback

If graph search fails (no relevant facts found), REMem falls back to dense passage retrieval:

if len(top_k_triples) == 0:
    logger.info("No triple found after reranking, return DPR results")
    sorted_chunk_ids, sorted_chunk_scores = self.dense_passage_retrieval(query)

This ensures robustness even when extraction misses key information.

Advanced: Custom Retrieval Strategy

You can implement a custom retrieval strategy:

from remem.rag_strategies.base_strategy import RAGStrategy

class CustomStrategy(RAGStrategy):
    def index(self, docs):
        # Custom indexing logic
        pass
    
    def retrieve_each_query(self, query, return_chunk=None):
        # Custom retrieval logic
        # Return: (sorted_chunk_ids, sorted_chunk_scores, metadata)
        pass
    
    def rag_for_qa(self, queries, **kwargs):
        # Custom QA pipeline
        pass

Then use it:

from remem.rag_strategies.factory import RAGStrategyFactory

# Register your strategy
RAGStrategyFactory.register("custom", CustomStrategy)

# Use it
config = BaseConfig(extract_method="custom")
rag = ReMem(global_config=config)

Performance Tuning

For speed:

config = BaseConfig(
    retrieval_top_k=50,  # Reduce from 200
    qa_top_k=3,  # Reduce from 5
    linking_top_k=3,  # Reduce from 5
)

For accuracy:

config = BaseConfig(
    retrieval_top_k=500,  # Increase
    qa_top_k=10,  # Increase
    linking_top_k=10,  # Increase
    damping=0.3,  # Lower damping = more exploration
)

For multi-hop questions:

config = BaseConfig(
    linking_top_k=10,  # More graph exploration
    passage_node_weight=0.01,  # Lower weight = more entity exploration
)

Next Steps

Review the Architecture to see how retrieval fits in
Understand the Memory Graph that retrieval navigates
Learn about Extraction Methods that determine retrieval behavior

Get Started

Core Concepts

Guides

Customization

Benchmarks

Retrieval Strategies

Retrieval Philosophy

Strategy Architecture

Default Strategy (OpenIE)

Retrieval Pipeline

Graph Search Algorithm

Example Trace

Episodic Gist Strategy

Key Differences from Default

Retrieval Pipeline

Agent Configuration

Return Chunk Type

Parallel Processing

Temporal Strategy

Configuration Parameters

Retrieval + QA Pipeline

Per-Sample Evaluation

Dense Passage Retrieval Fallback

Advanced: Custom Retrieval Strategy

Performance Tuning

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Customization

Benchmarks

Documentation Index

​Retrieval Philosophy

​Strategy Architecture

​Default Strategy (OpenIE)

​Retrieval Pipeline

​Graph Search Algorithm

​Example Trace

​Episodic Gist Strategy

​Key Differences from Default

​Retrieval Pipeline

​Agent Configuration

​Return Chunk Type

​Parallel Processing

​Temporal Strategy

​Configuration Parameters

​Retrieval + QA Pipeline

​Per-Sample Evaluation

​Dense Passage Retrieval Fallback

​Advanced: Custom Retrieval Strategy

​Performance Tuning

​Next Steps

Build docs developers (and LLMs) love

Retrieval Philosophy

Strategy Architecture

Default Strategy (OpenIE)

Retrieval Pipeline

Graph Search Algorithm

Example Trace

Episodic Gist Strategy

Key Differences from Default

Retrieval Pipeline

Agent Configuration

Return Chunk Type

Parallel Processing

Temporal Strategy

Configuration Parameters

Retrieval + QA Pipeline

Per-Sample Evaluation

Dense Passage Retrieval Fallback

Advanced: Custom Retrieval Strategy

Performance Tuning

Next Steps