Documentation Index
Fetch the complete documentation index at: https://mintlify.com/intuit-ai-research/REMem/llms.txt
Use this file to discover all available pages before exploring further.
REMemβs retrieval strategies combine dense passage retrieval with graph-based exploration to find relevant context for answering questions. Different extraction methods use different retrieval strategies.
Retrieval Philosophy
Traditional RAG uses dense retrieval alone:
- Embed the query
- Find k-nearest passages by cosine similarity
- Return top-k
REMem enhances this with graph navigation:
- Seed selection: Dense retrieval finds initial facts/gists
- Graph exploration: Navigate edges to related entities and passages
- Ranking fusion: Combine dense scores with graph signals
- Passage scoring: Personalized PageRank ranks final passages
This enables multi-hop reasoning that pure dense retrieval misses.
Strategy Architecture
Each extraction method has a corresponding strategy (from rag_strategies/factory.py):
class RAGStrategyFactory:
@staticmethod
def create_strategy(extract_method: str, remem_instance):
if extract_method == "openie":
return DefaultRAGStrategy(remem_instance)
elif extract_method in ["episodic_gist"]:
return EpisodicGistStrategy(remem_instance)
elif extract_method == "temporal":
return TemporalStrategy(remem_instance)
# ...
All strategies inherit from RAGStrategy (base_strategy.py) and implement:
index(): Build the graph
retrieve_each_query(): Retrieve for a single query
rag_for_qa(): Full RAG pipeline (retrieve + answer)
Default Strategy (OpenIE)
The default strategy for openie extraction combines fact retrieval with graph search.
Retrieval Pipeline
Step 1: Query-to-Fact Matching (remem.py:525)
Embed the query and find similar facts:
query_triple_scores = self.query_to_triple_scores(query)
# Returns scores for all facts based on embedding similarity
Step 2: Fact Reranking (remem.py:526)
Optionally rerank facts using a trained filter:
top_k_triple_indices, top_k_triples, rerank_log = self.rank_triples(query, query_triple_scores)
If no relevant facts are found after reranking:
if len(top_k_triples) == 0:
logger.info("No triple found after reranking, return DPR results")
sorted_chunk_ids, sorted_chunk_scores = self.dense_passage_retrieval(query)
Step 3: Graph Search (remem.py:531-538)
Navigate from facts to entities to passages:
sorted_chunk_ids, sorted_chunk_scores = self.graph_search_with_fact_entities(
query=query,
link_top_k=self.global_config.linking_top_k,
query_triple_scores=query_triple_scores,
top_k_triples=top_k_triples,
top_k_triple_indices=top_k_triple_indices,
passage_node_weight=self.global_config.passage_node_weight,
)
Graph Search Algorithm
The graph search uses Personalized PageRank to rank passages:
- Build seed set: Top-k facts + their entities
- Initialize PPR: Set seed weights based on query similarity
- Propagate: Random walk with damping through graph edges
- Extract passages: Collect passage nodes and their scores
- Normalize: Adjust passage scores by
passage_node_weight
Key parameters:
linking_top_k=5: How many neighbors to explore per node
damping=0.5: PPR damping factor (how much weight stays at seed nodes)
passage_node_weight=0.05: Multiplicative factor for passage scores
Example Trace
Query: βWho proposed the test that Turing created?β
1. Query-to-fact matching:
Top fact: (Alan Turing, proposed, Turing Test) [score: 0.92]
2. Graph exploration:
Fact β Entity "Alan Turing" β Entity "Turing Test"
β β
Passage 1 [0.85] Passage 2 [0.78]
3. Passage ranking:
Passage 1: "Alan Turing proposed the Turing Test in 1950." [final: 0.89]
Passage 2: "The Turing Test is a measure of machine intelligence." [final: 0.74]
Episodic Gist Strategy
For episodic_gist extraction, the strategy retrieves through gists and verbatim nodes.
Key Differences from Default
- Gist-based seeding: Initial retrieval uses gist summaries instead of facts
- Multi-level exploration: Navigate through verbatim β gist β fact β entity
- Agent-based QA: Uses tool-augmented reasoning for answer generation
Retrieval Pipeline
The episodic gist strategy delegates to an agent-based approach:
# From episodic_gist_strategy.py:875-877
sorted_chunk_ids, sorted_chunk_scores, agent_result = self._rag_each_query(
remem, query, return_chunk, gold_answer=current_gold_answer, question_metadata=question_metadata_item
)
The agent can use different retrieval tools:
- semantic_retrieve: Dense search over gists or verbatim
- lexical_retrieve: BM25 search
- fact_retrieve: Search over structured facts
Agent Configuration
Two modes for agent-based retrieval:
Fixed tools (config: agent_fixed_tools=True):
config = BaseConfig(
extract_method="episodic_gist",
agent_fixed_tools=True,
agent_max_steps=2, # 1=retrieve only, 2=retrieve+answer
agent_fixed_retrieval_tool="semantic_retrieve",
)
Agent always uses the specified retrieval tool, then outputs answer.
Flexible tools (config: agent_fixed_tools=False):
config = BaseConfig(
extract_method="episodic_gist",
agent_fixed_tools=False,
agent_max_steps=5, # Up to 5 reasoning steps
)
Agent chooses which tools to use at each step based on the question.
Return Chunk Type
You can retrieve different node types:
# Return verbatim (original text with metadata)
query_solutions, _, _, _, _ = rag.rag_for_qa(
queries=["What did the user ask about?"],
return_chunk="verbatim",
)
# Return gists (compressed summaries)
query_solutions, _, _, _, _ = rag.rag_for_qa(
queries=["What did the user ask about?"],
return_chunk="gists",
)
From episodic_gist_strategy.py:880-918:
if return_chunk == "verbatim":
hash_ids_to_fetch = [remem.entry_keys["verbatim"][idx] for idx in limited_chunk_ids]
chunk_rows = remem.episodic_embedding_stores["verbatim"].get_rows(hash_ids_to_fetch)
top_k_chunks_content = [row["content"] for row in chunk_rows.values()]
top_k_chunks_metadata = [row.get("metadata", None) for row in chunk_rows.values()]
elif return_chunk == "gists":
hash_ids_to_fetch = [remem.entry_keys["gists"][idx] for idx in limited_chunk_ids]
chunk_rows = remem.episodic_embedding_stores["gists"].get_rows(hash_ids_to_fetch)
top_k_chunks_content = [row["content"] for row in chunk_rows.values()]
When to use each:
verbatim: When you need exact quotes, speaker roles, timestamps
gists: When you need compressed context, faster reading for LLM
Parallel Processing
Episodic gist supports parallel query processing:
query_solutions, _, _, _, _ = rag.rag_for_qa(
queries=queries,
parallel=True,
max_workers=8, # Process 8 queries at once
)
From episodic_gist_strategy.py:653-694:
if parallel:
with ThreadPoolExecutor(max_workers=max_workers) as executor:
future_to_idx = {executor.submit(self._process_single_query, args): args[0] for args in args_list}
for future in as_completed(future_to_idx):
q_idx, query_solution, agent_result_dict, agent_answer = future.result()
# ...
Temporal Strategy
For temporal extraction, the strategy emphasizes temporal reasoning:
- Temporal fact retrieval: Facts with time qualifiers are prioritized
- Chronological ordering: Results can be sorted by time
- Temporal graph edges: Navigate through time-connected events
Configuration Parameters
Control retrieval behavior with these config options:
config = BaseConfig(
# Retrieval
retrieval_top_k=200, # How many passages to retrieve
linking_top_k=5, # How many neighbors to explore per node
damping=0.5, # PageRank damping factor
# Ranking
passage_node_weight=0.05, # Weight for passage nodes in PPR
# QA
qa_top_k=5, # How many passages to give to the LLM for answer generation
qa_passage_prefix="Wikipedia Title: ", # Prefix for passages in QA prompt
# Agent (for episodic_gist)
agent_fixed_tools=False, # Use fixed tools or flexible tool selection?
agent_max_steps=5, # Max reasoning steps
agent_fixed_retrieval_tool="semantic_retrieve", # Which retrieval tool for fixed mode
)
Retrieval + QA Pipeline
The full RAG pipeline combines retrieval with answer generation:
solutions, responses, meta, retrieval_metrics, qa_metrics = rag.rag_for_qa(
queries=["Who proposed the Turing Test?"],
gold_docs=[["passage_123"]], # For retrieval evaluation
gold_answers=[["Alan Turing"]], # For QA evaluation
metrics=("qa_em", "qa_f1", "retrieval_recall"),
)
Pipeline steps:
-
Retrieval (if not using pre-retrieved
QuerySolution objects):
query_solutions = self.remem.retrieve(queries=queries)
-
Retrieval evaluation (if
gold_docs provided):
overall_retrieval_metrics = self.remem.evaluate_retrieval(gold_docs, query_solutions, retrieval_evaluators)
-
Answer generation:
query_solutions, all_response_message, all_metadata = self.remem.qa(query_solutions)
-
QA evaluation (if
gold_answers provided):
overall_qa_metrics = self.remem.evaluate_qa(gold_answers, qa_evaluators, query_solutions, question_metadata)
-
Save results:
self.remem.save_rag_results(gold_answers, gold_docs, query_solutions, overall_qa_metrics, overall_retrieval_metrics)
Per-Sample Evaluation
For episodic gist, you can evaluate each sample as itβs processed:
query_solutions, _, _, _, qa_metrics = rag.rag_for_qa(
queries=queries,
gold_answers=gold_answers,
evaluate_per_sample=True, # Evaluate each query as it completes
save_per_sample=True, # Save each result individually
)
This enables real-time monitoring:
π Sample 0: qa_em: 1.0000, qa_f1: 1.0000 | Avg: qa_em: 1.0000, qa_f1: 1.0000 | Total: 1
π Sample 1: qa_em: 0.0000, qa_f1: 0.6667 | Avg: qa_em: 0.5000, qa_f1: 0.8333 | Total: 2
Dense Passage Retrieval Fallback
If graph search fails (no relevant facts found), REMem falls back to dense passage retrieval:
if len(top_k_triples) == 0:
logger.info("No triple found after reranking, return DPR results")
sorted_chunk_ids, sorted_chunk_scores = self.dense_passage_retrieval(query)
This ensures robustness even when extraction misses key information.
Advanced: Custom Retrieval Strategy
You can implement a custom retrieval strategy:
from remem.rag_strategies.base_strategy import RAGStrategy
class CustomStrategy(RAGStrategy):
def index(self, docs):
# Custom indexing logic
pass
def retrieve_each_query(self, query, return_chunk=None):
# Custom retrieval logic
# Return: (sorted_chunk_ids, sorted_chunk_scores, metadata)
pass
def rag_for_qa(self, queries, **kwargs):
# Custom QA pipeline
pass
Then use it:
from remem.rag_strategies.factory import RAGStrategyFactory
# Register your strategy
RAGStrategyFactory.register("custom", CustomStrategy)
# Use it
config = BaseConfig(extract_method="custom")
rag = ReMem(global_config=config)
For speed:
config = BaseConfig(
retrieval_top_k=50, # Reduce from 200
qa_top_k=3, # Reduce from 5
linking_top_k=3, # Reduce from 5
)
For accuracy:
config = BaseConfig(
retrieval_top_k=500, # Increase
qa_top_k=10, # Increase
linking_top_k=10, # Increase
damping=0.3, # Lower damping = more exploration
)
For multi-hop questions:
config = BaseConfig(
linking_top_k=10, # More graph exploration
passage_node_weight=0.01, # Lower weight = more entity exploration
)
Next Steps