Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/avnlp/vectordb/llms.txt

Use this file to discover all available pages before exploring further.

MMR (Maximal Marginal Relevance) balances relevance with diversity by penalizing documents too similar to those already selected. The result set covers more aspects of a topic instead of repeating similar content.

How it works

MMR iteratively selects documents by balancing query relevance against redundancy with already-selected documents.

MMR algorithm

MMR iteratively selects documents by scoring each candidate with:
MMR(d) = λ × sim(d, query) - (1-λ) × max_sim(d, selected)
Where:
  • λ (lambda_param) - Trade-off between relevance and diversity (0.0-1.0)
  • sim(d, query) - Cosine similarity between document and query embeddings
  • max_sim(d, selected) - Maximum similarity to any already-selected document

Selection process

  1. First document - Select most relevant to query
  2. Iterative selection - For remaining slots:
    • Calculate relevance to query for each candidate
    • Calculate redundancy (max similarity to selected docs)
    • Compute MMR score with lambda weighting
    • Select document with highest MMR score
  3. Repeat - Until k documents selected

Lambda parameter guidelines

lambda_param
float
Controls the relevance-diversity trade-off:
  • λ = 1.0 - Pure relevance ranking (no diversity penalty)
  • λ = 0.7-0.8 - Emphasize relevance, mild diversity (recommended for precision)
  • λ = 0.5 - Balanced relevance and diversity (good default)
  • λ = 0.3-0.4 - Emphasize diversity (recommended for exploratory search)
  • λ = 0.0 - Pure diversity (minimum redundancy, ignores relevance)

Key features

  • Tune relevance vs diversity to fit the task
  • Uses cosine similarity for both relevance and diversity scoring
  • Particularly useful for summarization and exploratory search
  • Greedy algorithm ensures efficiency

Implementation

from vectordb.langchain.utils import MMRHelper

# Generate embeddings for documents and query
doc_embeddings = embedder.embed_documents([doc.page_content for doc in documents])
query_embedding = embedder.embed_query(query)

# Apply MMR reranking
reranked = MMRHelper.mmr_rerank(
    documents=documents,
    embeddings=doc_embeddings,
    query_embedding=query_embedding,
    lambda_param=0.5,
    k=10,
)

# Returns list of (Document, MMR_score) tuples
for doc, score in reranked:
    print(f"MMR Score: {score:.3f} - {doc.page_content[:100]}")

Use cases

When users need to understand different aspects of a topic:
# Lower lambda for more diversity
results = MMRHelper.mmr_rerank(
    documents=candidates,
    embeddings=embeddings,
    query_embedding=query_emb,
    lambda_param=0.3,  # Emphasize diversity
    k=10,
)

Multi-document summarization

Provide diverse context to LLMs:
# Balanced approach
diverse_docs = MMRHelper.mmr_rerank_simple(
    documents=retrieved_docs,
    embeddings=doc_embeddings,
    query_embedding=query_embedding,
    k=5,
    lambda_param=0.5,
)

# Use diverse docs for summarization
summary = llm.summarize(diverse_docs)

Reducing near-duplicates

When search returns many similar results:
# High diversity to remove redundancy
unique_results = MMRHelper.mmr_rerank_simple(
    documents=search_results,
    embeddings=result_embeddings,
    query_embedding=query_emb,
    k=10,
    lambda_param=0.4,  # Favor diversity
)
When relevance is critical:
# High lambda for relevance focus
precise_results = MMRHelper.mmr_rerank(
    documents=candidates,
    embeddings=embeddings,
    query_embedding=query_emb,
    lambda_param=0.8,  # Emphasize relevance
    k=5,
)

Lambda parameter tuning

Task-specific recommendations

Task typeRecommended λRationale
Q&A systems0.7-0.8Prioritize relevant answers
Exploratory search0.3-0.4Show diverse perspectives
Summarization0.4-0.6Balance coverage and relevance
Deduplication0.2-0.4Maximize uniqueness
Fact verification0.6-0.7Relevant but diverse sources

Tuning guidelines

1

Start with default

Begin with lambda_param=0.5 (balanced)
2

Evaluate results

Check for redundancy or missing relevant docs
3

Adjust based on metrics

  • Too much redundancy? Decrease lambda (more diversity)
  • Missing relevant results? Increase lambda (more relevance)
4

A/B test

Compare user engagement across lambda values

Example with full pipeline

from vectordb.langchain.semantic_search import PineconeSemanticSearchPipeline
from vectordb.langchain.utils import MMRHelper, EmbedderHelper

# Initial retrieval
pipeline = PineconeSemanticSearchPipeline("config.yaml")
candidates = pipeline.search(
    query="climate change mitigation strategies",
    top_k=50,  # Over-fetch for MMR
)

# Generate embeddings
doc_texts = [doc.page_content for doc in candidates["documents"]]
doc_embeddings = embedder.embed_documents(doc_texts)
query_embedding = embedder.embed_query(candidates["query"])

# Apply MMR for diversity
diverse_results = MMRHelper.mmr_rerank_simple(
    documents=candidates["documents"],
    embeddings=doc_embeddings,
    query_embedding=query_embedding,
    k=10,
    lambda_param=0.5,
)

print(f"Retrieved {len(diverse_results)} diverse documents")
for i, doc in enumerate(diverse_results, 1):
    print(f"{i}. {doc.page_content[:100]}...")

Performance characteristics

Time complexity

  • First selection: O(n) to find most relevant
  • Subsequent selections: O((k-1) × n) for k selections from n candidates
  • Overall: O(k × n)
For typical values (k=10, n=100), this is very fast (~ms).

Space complexity

  • O(n × d) for storing embeddings (n docs, d dimensions)
  • Cosine similarity computed on-demand

Optimization tips

Pre-compute and cache embeddings for retrieved documents to avoid repeated embedding calls. Only re-embed when document content changes.

Comparison with other diversity methods

MethodApproachSpeedUse case
MMRQuery-aware greedy selectionFastGeneral diversity with relevance
ClusteringK-means + samplingModerateTopic coverage
Threshold filteringSimilarity cutoffFastestSimple deduplication
Graph-basedCommunity detectionSlowComplex relationships

Integration with diversity filtering

MMR is one of the diversity methods available in the diversity filtering pipeline:
from vectordb.langchain.diversity_filtering import PineconeDiversityFilteringSearchPipeline

pipeline = PineconeDiversityFilteringSearchPipeline({
    "pinecone": {"api_key": "...", "index_name": "..."},
    "diversity": {
        "method": "mmr",
        "lambda_param": 0.5,
        "max_documents": 10,
        "candidate_multiplier": 3,
    },
})

results = pipeline.search(query="machine learning", top_k=10)
See Diversity filtering for more details.

Diversity filtering

Complete diversity pipeline with MMR and clustering

Semantic search

Initial retrieval before MMR

Reranking

Cross-encoder scoring alternative

Contextual compression

Reduce retrieved context

Build docs developers (and LLMs) love