Documentation Index Fetch the complete documentation index at: https://mintlify.com/avnlp/vectordb/llms.txt
Use this file to discover all available pages before exploring further.
VectorDB uses YAML configuration files to define pipeline behavior. This reference documents all available configuration sections and options.
Configuration file structure
A complete configuration file includes these sections:
# Data loading
dataloader :
type : "triviaqa"
split : "test"
limit : 100
# Embedding models
embeddings :
model : "sentence-transformers/all-MiniLM-L6-v2"
device : "cpu"
batch_size : 32
# Vector database connection
pinecone :
api_key : "${PINECONE_API_KEY}"
index_name : "my-index"
dimension : 384
metric : "cosine"
# Search settings
search :
top_k : 10
# RAG generation
rag :
enabled : true
model : "llama-3.3-70b-versatile"
api_key : "${GROQ_API_KEY}"
temperature : 0.7
max_tokens : 2048
# Logging
logging :
name : "my_pipeline"
level : "INFO"
Environment variable substitution
VectorDB supports two environment variable syntaxes:
Simple substitution
With default value
Multiple substitutions
api_key : "${PINECONE_API_KEY}"
# Returns empty string if PINECONE_API_KEY is not set
See environment variables for the complete list of supported variables.
Dataloader configuration
Controls dataset loading and preprocessing.
dataloader :
type : "triviaqa" # Dataset type
dataset_name : "trivia_qa" # HuggingFace dataset name (optional)
config : "rc" # Dataset config variant (optional)
split : "test" # Dataset split: train, test, validation
limit : 100 # Max documents to load (null for all)
use_text_splitter : false # Enable chunking for long documents
Supported datasets
Open-domain question-answering dataset. dataloader :
type : "triviaqa"
split : "test"
limit : 100
AI2 Reasoning Challenge for science questions. dataloader :
type : "arc"
split : "test"
limit : 200
Popular entity factoid questions. dataloader :
type : "popqa"
split : "test"
limit : 100
Atomic facts for verification. dataloader :
type : "factscore"
split : "test"
limit : 100
Financial transcript Q&A. dataloader :
type : "earnings_calls"
split : "test"
limit : 50
Embeddings configuration
Defines the embedding models for vector generation.
Dense embeddings
embeddings :
model : "sentence-transformers/all-MiniLM-L6-v2" # HuggingFace model
device : "cpu" # cpu or cuda
batch_size : 32 # Embedding batch size
Model aliases
VectorDB provides convenient aliases for common models:
embeddings :
model : "qwen3" # Alias for Qwen/Qwen3-Embedding-0.6B
# model: "minilm" # Alias for sentence-transformers/all-MiniLM-L6-v2
# model: "mpnet" # Alias for sentence-transformers/all-mpnet-base-v2
Hybrid embeddings (dense + sparse)
embeddings :
model : "Qwen/Qwen3-Embedding-0.6B" # Dense model
sparse_model : "prithivida/Splade_PP_en_v2" # Sparse model
device : "cpu"
batch_size : 32
Vector database configuration
Each database has specific connection and indexing settings.
pinecone :
api_key : "${PINECONE_API_KEY}" # API key (required)
index_name : "my-index" # Index name (required)
namespace : "" # Namespace for isolation (optional)
dimension : 384 # Vector dimension (required)
metric : "cosine" # Distance metric: cosine, euclidean, dotproduct
recreate : false # Recreate index if exists (default: false)
Namespaces for multi-tenancy: pinecone :
api_key : "${PINECONE_API_KEY}"
index_name : "production"
namespace : "tenant-123" # Tenant-specific namespace
weaviate :
cluster_url : "${WEAVIATE_URL}" # Cluster URL (required)
api_key : "${WEAVIATE_API_KEY}" # API key (required)
collection_name : "Documents" # Collection name (required)
timeout : 30 # Request timeout in seconds
connection_pool_size : 10 # Connection pool size
Multi-tenancy: weaviate :
cluster_url : "${WEAVIATE_URL}"
api_key : "${WEAVIATE_API_KEY}"
collection_name : "Documents"
tenant : "customer-123" # Native tenant isolation
milvus :
uri : "${MILVUS_URI:-http://localhost:19530}" # Connection URI
token : "${MILVUS_TOKEN:-}" # Auth token (optional)
collection_name : "documents" # Collection name (required)
dimension : 384 # Vector dimension (required)
recreate : false # Recreate collection if exists
batch_size : 100 # Insert batch size
Partition-based multi-tenancy: milvus :
uri : "${MILVUS_URI}"
collection_name : "documents"
partition_key : "tenant_id" # Field for partitioning
num_partitions : 1000 # Max partitions
qdrant :
url : "${QDRANT_URL:-http://localhost:6333}" # Server URL
api_key : "${QDRANT_API_KEY:-}" # API key (optional)
collection_name : "documents" # Collection name (required)
timeout : 30 # Request timeout
prefer_grpc : true # Use gRPC (faster)
With quantization: qdrant :
url : "${QDRANT_URL}"
collection_name : "documents"
quantization :
enabled : true
method : "scalar" # scalar or binary
compression_ratio : 4.0 # Target compression
chroma :
path : "./chroma_data" # Local persistence path
# OR for client/server mode:
host : "${CHROMA_HOST:-localhost}"
port : ${CHROMA_PORT:-8000}
tenant : "default" # Tenant name
database : "default" # Database name
Search configuration
Controls retrieval behavior.
search :
top_k : 10 # Number of results to return
candidate_pool_size : 50 # Initial retrieval pool (before reranking)
rrf_k : 60 # RRF parameter for hybrid search
retrieval_mode : "with_parents" # Parent doc retrieval mode
max_parent_docs : 3 # Max unique parent documents
search :
top_k : 10
filters :
must : # All conditions must match
- key : "category"
match :
value : "science"
- key : "year"
range :
gte : 2020
RAG configuration
Controls answer generation with LLMs.
rag :
enabled : true # Enable answer generation
model : "llama-3.3-70b-versatile" # LLM model name
api_key : "${GROQ_API_KEY}" # API key
api_base_url : "https://api.groq.com/openai/v1" # API endpoint
temperature : 0.7 # Sampling temperature (0.0-1.0)
max_tokens : 2048 # Max tokens in response
provider : "groq" # Provider: groq or openai
Groq configuration
rag :
enabled : true
provider : "groq"
model : "llama-3.3-70b-versatile"
api_key : "${GROQ_API_KEY}"
temperature : 0.7
max_tokens : 2048
OpenAI configuration
rag :
enabled : true
provider : "openai"
model : "gpt-4-turbo-preview"
api_key : "${OPENAI_API_KEY}"
temperature : 0.7
max_tokens : 2048
Reranking configuration
Improves precision with cross-encoder models.
reranker :
type : "cross_encoder" # Reranker type
model : "BAAI/bge-reranker-v2-m3" # Model name
top_k : 5 # Final result count after reranking
Cohere reranking
reranker :
type : "cohere"
cohere_api_key : "${COHERE_API_KEY}"
model : "rerank-english-v3.0"
top_k : 5
Evaluation metrics
reranker :
type : "cross_encoder"
model : "BAAI/bge-reranker-v2-m3"
top_k : 5
evaluation :
enabled : true
metrics :
- contextual_recall
- contextual_precision
- answer_relevancy
- faithfulness
Advanced features
Query enhancement
Generate multiple query variations for better recall.
query_enhancement :
enabled : true
method : "multi_query" # multi_query, hyde, or step_back
num_queries : 3 # Number of variations to generate
llm_model : "llama-3.3-70b-versatile"
api_key : "${GROQ_API_KEY}"
Methods:
multi_query: Generate N paraphrases of the query
hyde: Generate hypothetical answer, then search for similar documents
step_back: Generate broader conceptual query
Parent document retrieval
Index small chunks, return large context.
indexing :
parent_chunk_size : 512 # Size of parent chunks (tokens)
child_chunk_size : 128 # Size of child chunks (tokens)
chunk_overlap : 20 # Overlap between chunks
search :
top_k : 5 # Child chunks to retrieve
retrieval_mode : "with_parents" # Return mode
max_parent_docs : 3 # Max unique parents
Retrieval modes:
children_only: Return only child chunks
with_parents: Return full parent documents
context_window: Return parent with surrounding context
Contextual compression
Reduce retrieved context to save LLM tokens.
compression :
enabled : true
strategy : "extractive" # extractive or llm_extraction
num_sentences : 5 # Max sentences per document
reranker_model : "cross-encoder/ms-marco-MiniLM-L-6-v2"
Agentic RAG
Iterative retrieval with self-reflection.
agentic :
max_iterations : 3 # Max refinement iterations
quality_threshold : 75 # Quality score threshold (0-100)
router_model : "llama-3.3-70b-versatile"
compression_mode : "reranking" # reranking or llm
Cost optimization
Balance quality and cost.
cost_optimization :
context_budget : 2000 # Max tokens for LLM context
model_tiering :
routing : "llama-3.1-8b-instant" # Cheaper model for routing
generation : "llama-3.3-70b-versatile" # Capable model for answers
compression :
enabled : true
strategy : "extractive"
num_sentences : 5
Chunking configuration
chunking :
chunk_size : 1000 # Max chunk size (characters)
chunk_overlap : 200 # Overlap between chunks
separators : # Split on these separators (in order)
- " \n\n "
- " \n "
- " "
- ""
Logging configuration
Control logging output.
logging :
name : "vectordb_pipeline" # Logger name
level : "INFO" # Log level: DEBUG, INFO, WARNING, ERROR
format : "text" # Format: text or json
Log levels by environment
logging :
name : "vectordb_production"
level : "${LOG_LEVEL:-WARNING}" # Use env var with fallback
Collection configuration
Defines collection metadata (used by some features).
collection :
name : "documents" # Collection/index name
description : "Product documentation corpus"
Complete configuration examples
Semantic search (Pinecone)
dataloader :
type : "triviaqa"
split : "test"
limit : 100
use_text_splitter : false
embeddings :
model : "sentence-transformers/all-MiniLM-L6-v2"
device : "cpu"
batch_size : 32
pinecone :
api_key : "${PINECONE_API_KEY}"
index_name : "lc-semantic-search-triviaqa"
namespace : ""
dimension : 384
metric : "cosine"
recreate : false
search :
top_k : 10
rag :
enabled : false
model : "llama-3.3-70b-versatile"
api_key : "${GROQ_API_KEY}"
temperature : 0.7
max_tokens : 2048
logging :
name : "lc_semantic_search_pinecone"
level : "INFO"
dataloader :
type : "triviaqa"
dataset_name : "trivia_qa"
config : "rc"
split : "test"
limit : null
embeddings :
model : "Qwen/Qwen3-Embedding-0.6B"
sparse_model : "prithivida/Splade_PP_en_v2"
device : "cpu"
batch_size : 32
milvus :
uri : "${MILVUS_URI:-http://localhost:19530}"
token : "${MILVUS_TOKEN:-}"
collection_name : "triviaqa_hybrid"
dimension : 384
recreate : false
batch_size : 100
logging :
name : "milvus_hybrid"
level : "INFO"
Reranking pipeline (Haystack)
pinecone :
api_key : "${PINECONE_API_KEY:-}"
collection :
name : "triviaqa_reranking"
dataloader :
type : "triviaqa"
dataset_name : "trivia_qa"
config : "rc"
split : "test"
limit : null
generator :
model : "llama-3.3-70b-versatile"
api_key : "${GROQ_API_KEY:-}"
embeddings :
model : "Qwen/Qwen3-Embedding-0.6B"
batch_size : 32
reranker :
type : "cross_encoder"
model : "BAAI/bge-reranker-v2-m3"
top_k : 5
evaluation :
enabled : true
metrics :
- contextual_recall
- contextual_precision
- answer_relevancy
- faithfulness
logging :
name : "pinecone_reranking"
level : "INFO"
dataloader :
type : "triviaqa"
split : "test"
limit : 100
use_text_splitter : false
embeddings :
model : "sentence-transformers/all-MiniLM-L6-v2"
device : "cpu"
batch_size : 32
pinecone :
api_key : "${PINECONE_API_KEY}"
index_name : "lc-agentic-rag-triviaqa"
namespace : ""
dimension : 384
metric : "cosine"
recreate : false
search :
top_k : 10
rag :
enabled : true
model : "llama-3.3-70b-versatile"
api_key : "${GROQ_API_KEY}"
temperature : 0.7
max_tokens : 2048
reranker :
model : "cross-encoder/ms-marco-MiniLM-L-6-v2"
agentic :
router_model : "llama-3.3-70b-versatile"
max_iterations : 3
compression_mode : "reranking"
logging :
name : "lc_agentic_rag_pinecone_triviaqa"
level : "INFO"
dataloader :
type : "triviaqa"
split : "test"
limit : 100
use_text_splitter : false
embeddings :
model : "sentence-transformers/all-MiniLM-L6-v2"
device : "cpu"
batch_size : 32
chunking :
chunk_size : 1000
chunk_overlap : 200
separators :
- " \n\n "
- " \n "
- " "
- ""
pinecone :
api_key : "${PINECONE_API_KEY}"
index_name : "lc-cost-optimized-rag-triviaqa"
namespace : ""
dimension : 384
metric : "cosine"
recreate : false
search :
top_k : 10
rrf_k : 60
rag :
enabled : false
model : "llama-3.3-70b-versatile"
api_key : "${GROQ_API_KEY}"
temperature : 0.7
max_tokens : 2048
logging :
name : "lc_cost_optimized_rag_pinecone"
level : "INFO"
Loading configurations in code
VectorDB provides multiple ways to load configurations:
From YAML file
from vectordb.utils.config_loader import ConfigLoader
config = ConfigLoader.load( "configs/production.yaml" )
ConfigLoader.validate(config, "pinecone" )
From dictionary
config = {
"dataloader" : { "type" : "triviaqa" , "split" : "test" },
"embeddings" : { "model" : "minilm" , "batch_size" : 32 },
"pinecone" : {
"api_key" : os.getenv( "PINECONE_API_KEY" ),
"index_name" : "my-index" ,
"dimension" : 384
}
}
resolved_config = ConfigLoader.load(config)
With pipeline classes
from vectordb.langchain.semantic_search import PineconeSemanticSearchPipeline
pipeline = PineconeSemanticSearchPipeline( "configs/production.yaml" )
Next steps
Environment variables Reference for all environment variables
Building RAG pipelines Step-by-step tutorial using these configurations
Benchmarking Evaluate different configurations
Production deployment Deploy your configured pipelines