Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/avnlp/vectordb/llms.txt

Use this file to discover all available pages before exploring further.

VectorDB uses YAML configuration files to define pipeline behavior. This reference documents all available configuration sections and options.

Configuration file structure

A complete configuration file includes these sections:
# Data loading
dataloader:
  type: "triviaqa"
  split: "test"
  limit: 100

# Embedding models
embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"
  device: "cpu"
  batch_size: 32

# Vector database connection
pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "my-index"
  dimension: 384
  metric: "cosine"

# Search settings
search:
  top_k: 10

# RAG generation
rag:
  enabled: true
  model: "llama-3.3-70b-versatile"
  api_key: "${GROQ_API_KEY}"
  temperature: 0.7
  max_tokens: 2048

# Logging
logging:
  name: "my_pipeline"
  level: "INFO"

Environment variable substitution

VectorDB supports two environment variable syntaxes:
api_key: "${PINECONE_API_KEY}"
# Returns empty string if PINECONE_API_KEY is not set
See environment variables for the complete list of supported variables.

Dataloader configuration

Controls dataset loading and preprocessing.
dataloader:
  type: "triviaqa"           # Dataset type
  dataset_name: "trivia_qa"  # HuggingFace dataset name (optional)
  config: "rc"               # Dataset config variant (optional)
  split: "test"              # Dataset split: train, test, validation
  limit: 100                 # Max documents to load (null for all)
  use_text_splitter: false   # Enable chunking for long documents

Supported datasets

Open-domain question-answering dataset.
dataloader:
  type: "triviaqa"
  split: "test"
  limit: 100
AI2 Reasoning Challenge for science questions.
dataloader:
  type: "arc"
  split: "test"
  limit: 200
Popular entity factoid questions.
dataloader:
  type: "popqa"
  split: "test"
  limit: 100
Atomic facts for verification.
dataloader:
  type: "factscore"
  split: "test"
  limit: 100
Financial transcript Q&A.
dataloader:
  type: "earnings_calls"
  split: "test"
  limit: 50

Embeddings configuration

Defines the embedding models for vector generation.

Dense embeddings

embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"  # HuggingFace model
  device: "cpu"                                     # cpu or cuda
  batch_size: 32                                    # Embedding batch size

Model aliases

VectorDB provides convenient aliases for common models:
embeddings:
  model: "qwen3"     # Alias for Qwen/Qwen3-Embedding-0.6B
  # model: "minilm" # Alias for sentence-transformers/all-MiniLM-L6-v2
  # model: "mpnet"  # Alias for sentence-transformers/all-mpnet-base-v2

Hybrid embeddings (dense + sparse)

embeddings:
  model: "Qwen/Qwen3-Embedding-0.6B"          # Dense model
  sparse_model: "prithivida/Splade_PP_en_v2"  # Sparse model
  device: "cpu"
  batch_size: 32

Vector database configuration

Each database has specific connection and indexing settings.
pinecone:
  api_key: "${PINECONE_API_KEY}"  # API key (required)
  index_name: "my-index"          # Index name (required)
  namespace: ""                   # Namespace for isolation (optional)
  dimension: 384                  # Vector dimension (required)
  metric: "cosine"                # Distance metric: cosine, euclidean, dotproduct
  recreate: false                 # Recreate index if exists (default: false)
Namespaces for multi-tenancy:
pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "production"
  namespace: "tenant-123"  # Tenant-specific namespace
weaviate:
  cluster_url: "${WEAVIATE_URL}"      # Cluster URL (required)
  api_key: "${WEAVIATE_API_KEY}"      # API key (required)
  collection_name: "Documents"         # Collection name (required)
  timeout: 30                          # Request timeout in seconds
  connection_pool_size: 10             # Connection pool size
Multi-tenancy:
weaviate:
  cluster_url: "${WEAVIATE_URL}"
  api_key: "${WEAVIATE_API_KEY}"
  collection_name: "Documents"
  tenant: "customer-123"  # Native tenant isolation
milvus:
  uri: "${MILVUS_URI:-http://localhost:19530}"  # Connection URI
  token: "${MILVUS_TOKEN:-}"                    # Auth token (optional)
  collection_name: "documents"                  # Collection name (required)
  dimension: 384                                # Vector dimension (required)
  recreate: false                               # Recreate collection if exists
  batch_size: 100                               # Insert batch size
Partition-based multi-tenancy:
milvus:
  uri: "${MILVUS_URI}"
  collection_name: "documents"
  partition_key: "tenant_id"    # Field for partitioning
  num_partitions: 1000          # Max partitions
qdrant:
  url: "${QDRANT_URL:-http://localhost:6333}"  # Server URL
  api_key: "${QDRANT_API_KEY:-}"               # API key (optional)
  collection_name: "documents"                 # Collection name (required)
  timeout: 30                                  # Request timeout
  prefer_grpc: true                            # Use gRPC (faster)
With quantization:
qdrant:
  url: "${QDRANT_URL}"
  collection_name: "documents"
  quantization:
    enabled: true
    method: "scalar"        # scalar or binary
    compression_ratio: 4.0  # Target compression
chroma:
  path: "./chroma_data"                      # Local persistence path
  # OR for client/server mode:
  host: "${CHROMA_HOST:-localhost}"
  port: ${CHROMA_PORT:-8000}
  tenant: "default"                          # Tenant name
  database: "default"                        # Database name

Search configuration

Controls retrieval behavior.
search:
  top_k: 10                      # Number of results to return
  candidate_pool_size: 50        # Initial retrieval pool (before reranking)
  rrf_k: 60                      # RRF parameter for hybrid search
  retrieval_mode: "with_parents" # Parent doc retrieval mode
  max_parent_docs: 3             # Max unique parent documents

Metadata filtering

search:
  top_k: 10
  filters:
    must:                        # All conditions must match
      - key: "category"
        match:
          value: "science"
      - key: "year"
        range:
          gte: 2020

RAG configuration

Controls answer generation with LLMs.
rag:
  enabled: true                          # Enable answer generation
  model: "llama-3.3-70b-versatile"       # LLM model name
  api_key: "${GROQ_API_KEY}"             # API key
  api_base_url: "https://api.groq.com/openai/v1"  # API endpoint
  temperature: 0.7                       # Sampling temperature (0.0-1.0)
  max_tokens: 2048                       # Max tokens in response
  provider: "groq"                       # Provider: groq or openai

Groq configuration

rag:
  enabled: true
  provider: "groq"
  model: "llama-3.3-70b-versatile"
  api_key: "${GROQ_API_KEY}"
  temperature: 0.7
  max_tokens: 2048

OpenAI configuration

rag:
  enabled: true
  provider: "openai"
  model: "gpt-4-turbo-preview"
  api_key: "${OPENAI_API_KEY}"
  temperature: 0.7
  max_tokens: 2048

Reranking configuration

Improves precision with cross-encoder models.
reranker:
  type: "cross_encoder"                  # Reranker type
  model: "BAAI/bge-reranker-v2-m3"       # Model name
  top_k: 5                               # Final result count after reranking

Cohere reranking

reranker:
  type: "cohere"
  cohere_api_key: "${COHERE_API_KEY}"
  model: "rerank-english-v3.0"
  top_k: 5

Evaluation metrics

reranker:
  type: "cross_encoder"
  model: "BAAI/bge-reranker-v2-m3"
  top_k: 5

evaluation:
  enabled: true
  metrics:
    - contextual_recall
    - contextual_precision
    - answer_relevancy
    - faithfulness

Advanced features

Query enhancement

Generate multiple query variations for better recall.
query_enhancement:
  enabled: true
  method: "multi_query"  # multi_query, hyde, or step_back
  num_queries: 3         # Number of variations to generate
  llm_model: "llama-3.3-70b-versatile"
  api_key: "${GROQ_API_KEY}"
Methods:
  • multi_query: Generate N paraphrases of the query
  • hyde: Generate hypothetical answer, then search for similar documents
  • step_back: Generate broader conceptual query

Parent document retrieval

Index small chunks, return large context.
indexing:
  parent_chunk_size: 512     # Size of parent chunks (tokens)
  child_chunk_size: 128      # Size of child chunks (tokens)
  chunk_overlap: 20          # Overlap between chunks

search:
  top_k: 5                   # Child chunks to retrieve
  retrieval_mode: "with_parents"  # Return mode
  max_parent_docs: 3         # Max unique parents
Retrieval modes:
  • children_only: Return only child chunks
  • with_parents: Return full parent documents
  • context_window: Return parent with surrounding context

Contextual compression

Reduce retrieved context to save LLM tokens.
compression:
  enabled: true
  strategy: "extractive"     # extractive or llm_extraction
  num_sentences: 5           # Max sentences per document
  reranker_model: "cross-encoder/ms-marco-MiniLM-L-6-v2"

Agentic RAG

Iterative retrieval with self-reflection.
agentic:
  max_iterations: 3                      # Max refinement iterations
  quality_threshold: 75                  # Quality score threshold (0-100)
  router_model: "llama-3.3-70b-versatile"
  compression_mode: "reranking"          # reranking or llm

Cost optimization

Balance quality and cost.
cost_optimization:
  context_budget: 2000                   # Max tokens for LLM context
  model_tiering:
    routing: "llama-3.1-8b-instant"      # Cheaper model for routing
    generation: "llama-3.3-70b-versatile" # Capable model for answers
  compression:
    enabled: true
    strategy: "extractive"
    num_sentences: 5

Chunking configuration

chunking:
  chunk_size: 1000           # Max chunk size (characters)
  chunk_overlap: 200         # Overlap between chunks
  separators:                # Split on these separators (in order)
    - "\n\n"
    - "\n"
    - " "
    - ""

Logging configuration

Control logging output.
logging:
  name: "vectordb_pipeline"   # Logger name
  level: "INFO"               # Log level: DEBUG, INFO, WARNING, ERROR
  format: "text"              # Format: text or json

Log levels by environment

logging:
  name: "vectordb_production"
  level: "${LOG_LEVEL:-WARNING}"  # Use env var with fallback

Collection configuration

Defines collection metadata (used by some features).
collection:
  name: "documents"           # Collection/index name
  description: "Product documentation corpus"

Complete configuration examples

dataloader:
  type: "triviaqa"
  split: "test"
  limit: 100
  use_text_splitter: false

embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"
  device: "cpu"
  batch_size: 32

pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "lc-semantic-search-triviaqa"
  namespace: ""
  dimension: 384
  metric: "cosine"
  recreate: false

search:
  top_k: 10

rag:
  enabled: false
  model: "llama-3.3-70b-versatile"
  api_key: "${GROQ_API_KEY}"
  temperature: 0.7
  max_tokens: 2048

logging:
  name: "lc_semantic_search_pinecone"
  level: "INFO"
dataloader:
  type: "triviaqa"
  dataset_name: "trivia_qa"
  config: "rc"
  split: "test"
  limit: null

embeddings:
  model: "Qwen/Qwen3-Embedding-0.6B"
  sparse_model: "prithivida/Splade_PP_en_v2"
  device: "cpu"
  batch_size: 32

milvus:
  uri: "${MILVUS_URI:-http://localhost:19530}"
  token: "${MILVUS_TOKEN:-}"
  collection_name: "triviaqa_hybrid"
  dimension: 384
  recreate: false
  batch_size: 100

logging:
  name: "milvus_hybrid"
  level: "INFO"
pinecone:
  api_key: "${PINECONE_API_KEY:-}"

collection:
  name: "triviaqa_reranking"

dataloader:
  type: "triviaqa"
  dataset_name: "trivia_qa"
  config: "rc"
  split: "test"
  limit: null

generator:
  model: "llama-3.3-70b-versatile"
  api_key: "${GROQ_API_KEY:-}"

embeddings:
  model: "Qwen/Qwen3-Embedding-0.6B"
  batch_size: 32

reranker:
  type: "cross_encoder"
  model: "BAAI/bge-reranker-v2-m3"
  top_k: 5

evaluation:
  enabled: true
  metrics:
    - contextual_recall
    - contextual_precision
    - answer_relevancy
    - faithfulness

logging:
  name: "pinecone_reranking"
  level: "INFO"
dataloader:
  type: "triviaqa"
  split: "test"
  limit: 100
  use_text_splitter: false

embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"
  device: "cpu"
  batch_size: 32

pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "lc-agentic-rag-triviaqa"
  namespace: ""
  dimension: 384
  metric: "cosine"
  recreate: false

search:
  top_k: 10

rag:
  enabled: true
  model: "llama-3.3-70b-versatile"
  api_key: "${GROQ_API_KEY}"
  temperature: 0.7
  max_tokens: 2048

reranker:
  model: "cross-encoder/ms-marco-MiniLM-L-6-v2"

agentic:
  router_model: "llama-3.3-70b-versatile"
  max_iterations: 3
  compression_mode: "reranking"

logging:
  name: "lc_agentic_rag_pinecone_triviaqa"
  level: "INFO"
dataloader:
  type: "triviaqa"
  split: "test"
  limit: 100
  use_text_splitter: false

embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"
  device: "cpu"
  batch_size: 32

chunking:
  chunk_size: 1000
  chunk_overlap: 200
  separators:
    - "\n\n"
    - "\n"
    - " "
    - ""

pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "lc-cost-optimized-rag-triviaqa"
  namespace: ""
  dimension: 384
  metric: "cosine"
  recreate: false

search:
  top_k: 10
  rrf_k: 60

rag:
  enabled: false
  model: "llama-3.3-70b-versatile"
  api_key: "${GROQ_API_KEY}"
  temperature: 0.7
  max_tokens: 2048

logging:
  name: "lc_cost_optimized_rag_pinecone"
  level: "INFO"

Loading configurations in code

VectorDB provides multiple ways to load configurations:

From YAML file

from vectordb.utils.config_loader import ConfigLoader

config = ConfigLoader.load("configs/production.yaml")
ConfigLoader.validate(config, "pinecone")

From dictionary

config = {
    "dataloader": {"type": "triviaqa", "split": "test"},
    "embeddings": {"model": "minilm", "batch_size": 32},
    "pinecone": {
        "api_key": os.getenv("PINECONE_API_KEY"),
        "index_name": "my-index",
        "dimension": 384
    }
}

resolved_config = ConfigLoader.load(config)

With pipeline classes

from vectordb.langchain.semantic_search import PineconeSemanticSearchPipeline

pipeline = PineconeSemanticSearchPipeline("configs/production.yaml")

Next steps

Environment variables

Reference for all environment variables

Building RAG pipelines

Step-by-step tutorial using these configurations

Benchmarking

Evaluate different configurations

Production deployment

Deploy your configured pipelines

Build docs developers (and LLMs) love