Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/avnlp/vectordb/llms.txt

Use this file to discover all available pages before exploring further.

LangChain integration provides retrieval chains and pipelines for building RAG applications with vector databases.

Chain Types

VectorDB provides several pre-built chain types for LangChain:
  • Semantic Search: Dense vector retrieval using embedding models
  • Hybrid Indexing: Combined dense and sparse vector indexing
  • Sparse Indexing: BM25-style keyword-based retrieval
  • MMR (Maximal Marginal Relevance): Diversity-optimized retrieval
  • Parent Document Retrieval: Hierarchical chunking with parent-child relationships
  • Query Enhancement: Multi-query and HyDE (Hypothetical Document Embeddings)
  • Reranking: Cross-encoder reranking of retrieved results
  • Contextual Compression: Token optimization through context compression
  • Agentic RAG: Self-reflective retrieval with routing decisions
  • Multi-tenancy: Namespace-based data isolation
  • Metadata Filtering: Advanced filtering on document metadata
  • JSON Indexing: Indexing and filtering on nested JSON fields
  • Diversity Filtering: MMR-based diversity in retrieval

Supported Vector Databases

All LangChain chains support these vector databases:
  • Chroma: ChromaSemanticSearchPipeline, ChromaMmrSearchPipeline, etc.
  • Milvus: MilvusSemanticSearchPipeline, MilvusHybridSearchPipeline, etc.
  • Pinecone: PineconeSemanticSearchPipeline, PineconeHybridSearchPipeline, etc.
  • Qdrant: QdrantSemanticSearchPipeline, QdrantMmrSearchPipeline, etc.
  • Weaviate: WeaviateSemanticSearchPipeline, WeaviateHybridSearchPipeline, etc.

Common Chain Methods

All LangChain chains share common initialization patterns and methods:

Constructor Pattern

Pipeline(
    config_path: str,
    collection_name: Optional[str] = None,
    embedding_model: Optional[str] = None,
    **kwargs
)
config_path
str
required
Path to YAML configuration file containing database credentials and settings
collection_name
str
Override collection name from config
embedding_model
str
Override embedding model from config (e.g., “sentence-transformers/all-MiniLM-L6-v2”)
**kwargs
Any
Additional chain-specific parameters
Perform retrieval search and return LangChain Documents.
search(
    query: str,
    top_k: int = 10,
    filters: Optional[Dict[str, Any]] = None,
    **kwargs
) -> List[Document]
query
str
required
Query text to search for
top_k
int
default:"10"
Number of results to return
filters
Dict[str, Any]
Metadata filters to apply
**kwargs
Any
Chain-specific search parameters
documents
List[Document]
Retrieved LangChain Document objects ordered by relevance

as_retriever

Convert pipeline to LangChain Retriever interface.
as_retriever(**kwargs) -> BaseRetriever
**kwargs
Any
Retriever configuration parameters
retriever
BaseRetriever
LangChain BaseRetriever instance for use in chains

Example Usage

from langchain_openai import OpenAIEmbeddings
from vectordb.langchain.semantic_search import ChromaSemanticSearchPipeline

# Initialize pipeline
pipeline = ChromaSemanticSearchPipeline(
    config_path="config.yaml",
    collection_name="my_docs",
    embedding_model="text-embedding-3-small"
)

# Search
results = pipeline.search(
    query="What is machine learning?",
    top_k=5
)

# Use as retriever in a chain
retriever = pipeline.as_retriever()
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="gpt-4"),
    retriever=retriever
)
result = qa_chain.invoke({"query": "Explain quantum computing"})
from vectordb.langchain.hybrid_indexing import MilvusHybridSearchPipeline

pipeline = MilvusHybridSearchPipeline(
    config_path="config.yaml",
    collection_name="hybrid_docs"
)

# Hybrid search combines dense and sparse vectors
results = pipeline.search(
    query="quantum computing applications",
    top_k=10,
    ranker_type="rrf"  # Reciprocal Rank Fusion
)

Agentic RAG

from langchain_groq import ChatGroq
from vectordb.langchain.agentic_rag import ChromaAgenticRAGPipeline

llm = ChatGroq(model="llama-3.3-70b-versatile")
pipeline = ChromaAgenticRAGPipeline(
    config_path="config.yaml",
    llm=llm
)

# Agentic search with self-reflection
result = pipeline.search(
    query="Complex multi-hop question",
    max_iterations=3
)

Multi-tenancy

from vectordb.langchain.multi_tenancy import PineconeMultiTenancyPipeline

pipeline = PineconeMultiTenancyPipeline(config_path="config.yaml")

# Index documents for tenant A
from langchain_core.documents import Document

documents = [
    Document(page_content="Financial report Q1"),
    Document(page_content="Financial report Q2")
]
pipeline.index(documents, namespace="tenant_a")

# Search within tenant A only
results = pipeline.search(
    query="financial reports",
    namespace="tenant_a",
    top_k=5
)

Build docs developers (and LLMs) love