Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/avnlp/vectordb/llms.txt

Use this file to discover all available pages before exploring further.

Reusable components for building custom LangChain RAG applications.

AgenticRouter

Route queries to search, reflect, or generate actions using LLM reasoning for agentic RAG patterns.

Constructor

AgenticRouter(llm: ChatGroq)
llm
ChatGroq
required
ChatGroq LLM instance for routing decisions. Should be configured with low temperature (0.0-0.3) for consistent routing

Methods

route

Route a query to the appropriate action based on current pipeline state.
route(
    query: str,
    has_documents: bool = False,
    current_answer: str | None = None,
    iteration: int = 1,
    max_iterations: int = 3
) -> dict[str, Any]
query
str
required
The user’s original query text
has_documents
bool
default:"False"
Indicates whether documents have already been retrieved in previous iterations
current_answer
str
The answer generated so far, if any. Used to assess whether reflection or generation is appropriate
iteration
int
default:"1"
Current iteration number (1-indexed). Used to track progress and enforce iteration limits
max_iterations
int
default:"3"
Maximum number of routing iterations allowed. Prevents infinite loops
action
str
One of ‘search’, ‘reflect’, or ‘generate’
reasoning
str
Human-readable explanation of the routing decision

ContextCompressor

Compress retrieved context using reranking or LLM-based extraction to reduce token usage.

Constructor

ContextCompressor(
    mode: str = "reranking",
    llm: ChatGroq | None = None,
    reranker: HuggingFaceCrossEncoder | None = None
)
mode
str
default:"reranking"
Compression mode: “reranking” or “llm_extraction”
llm
ChatGroq
ChatGroq instance for LLM extraction mode. Required when mode is “llm_extraction”
reranker
HuggingFaceCrossEncoder
HuggingFaceCrossEncoder instance for reranking mode. Required when mode is “reranking”

Methods

compress

Compress documents using the configured compression strategy.
compress(
    query: str,
    documents: list[Document],
    top_k: int = 5
) -> list[Document]
query
str
required
The user’s query text. Used to determine relevance
documents
list[Document]
required
List of LangChain Document objects to compress
top_k
int
default:"5"
Number of documents to return (only used in reranking mode)
compressed
list[Document]
Compressed list of documents. Structure depends on mode:
  • reranking: List of top_k Document objects, sorted by relevance
  • llm_extraction: List containing single synthesized Document

compress_reranking

Compress documents using cross-encoder reranking.
compress_reranking(
    query: str,
    documents: list[Document],
    top_k: int = 5
) -> list[Document]
query
str
required
Query text for relevance scoring
documents
list[Document]
required
Documents to rerank
top_k
int
default:"5"
Number of top documents to return
reranked
list[Document]
Top-k documents sorted by relevance score (highest first)

compress_llm_extraction

Compress documents using LLM-based passage extraction.
compress_llm_extraction(
    query: str,
    documents: list[Document]
) -> list[Document]
query
str
required
Query text to guide extraction
documents
list[Document]
required
Documents to extract from
extracted
list[Document]
List containing a single Document with extracted passages. Metadata includes ‘source’: ‘compressed’ and ‘original_doc_count’

QueryEnhancer

Enhance queries using multi-query generation, HyDE (Hypothetical Document Embeddings), and step-back techniques.

Constructor

QueryEnhancer(llm: ChatGroq)
llm
ChatGroq
required
ChatGroq LLM instance for query enhancement

Methods

generate_multi_queries

Generate multiple query variations for better retrieval coverage.
generate_multi_queries(
    query: str,
    num_queries: int = 3
) -> list[str]
query
str
required
Original query
num_queries
int
default:"3"
Number of query variations to generate
queries
list[str]
List of query variations including the original query

generate_hyde_document

Generate a hypothetical document that would answer the query.
generate_hyde_document(query: str) -> str
query
str
required
Query to generate hypothetical document for
document
str
Hypothetical document text that can be embedded and used for retrieval

generate_step_back_query

Generate a step-back query that asks a more general question.
generate_step_back_query(query: str) -> str
query
str
required
Specific query to generalize
step_back
str
More general query useful for retrieving background context

MMRHelper

Maximal Marginal Relevance utilities for diversity-optimized retrieval.

Methods

mmr_rerank

Rerank documents using MMR algorithm to balance relevance and diversity.
MMRHelper.mmr_rerank(
    documents: list[Document],
    embeddings: list[list[float]],
    query_embedding: list[float],
    k: int = 10,
    lambda_param: float = 0.5
) -> list[Document]
documents
list[Document]
required
Documents to rerank
embeddings
list[list[float]]
required
Document embeddings corresponding to documents list
query_embedding
list[float]
required
Query embedding vector
k
int
default:"10"
Number of documents to return
lambda_param
float
default:"0.5"
Balance parameter between relevance (1.0) and diversity (0.0). Default 0.5 balances both
reranked
list[Document]
Reranked documents optimized for relevance and diversity

Usage Examples

Agentic routing

from langchain_groq import ChatGroq
from vectordb.langchain.components import AgenticRouter

llm = ChatGroq(model="llama-3.3-70b-versatile", temperature=0)
router = AgenticRouter(llm)

# Initial routing - should suggest 'search'
decision = router.route(
    "What is quantum computing?",
    has_documents=False
)
print(decision)
# {'action': 'search', 'reasoning': 'No documents retrieved yet'}

# After retrieval - may suggest 'reflect' or 'generate'
decision = router.route(
    "What is quantum computing?",
    has_documents=True,
    current_answer="Quantum computing uses qubits...",
    iteration=2
)

Context compression with reranking

from langchain_community.cross_encoders import HuggingFaceCrossEncoder
from vectordb.langchain.components import ContextCompressor

reranker = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")
compressor = ContextCompressor(mode="reranking", reranker=reranker)

# Compress 10 documents down to top 3
compressed = compressor.compress(
    query="What is AI?",
    documents=retrieved_documents,
    top_k=3
)

Context compression with LLM extraction

from langchain_groq import ChatGroq
from vectordb.langchain.components import ContextCompressor

llm = ChatGroq(model="llama-3.3-70b-versatile")
compressor = ContextCompressor(mode="llm_extraction", llm=llm)

# Extract relevant passages from documents
compressed = compressor.compress(
    query="Explain neural networks",
    documents=retrieved_documents
)

Query enhancement

from langchain_groq import ChatGroq
from vectordb.langchain.components import QueryEnhancer

llm = ChatGroq(model="llama-3.3-70b-versatile")
enhancer = QueryEnhancer(llm)

# Generate multiple query variations
queries = enhancer.generate_multi_queries(
    "What are the applications of AI?",
    num_queries=3
)

# Generate hypothetical document
hyde_doc = enhancer.generate_hyde_document(
    "How does machine learning work?"
)

# Generate step-back query
step_back = enhancer.generate_step_back_query(
    "What is the training process for GPT-4?"
)
# Returns: "What are the general principles of training large language models?"

Build docs developers (and LLMs) love