Documentation Index Fetch the complete documentation index at: https://mintlify.com/avnlp/vectordb/llms.txt
Use this file to discover all available pages before exploring further.
Parent document retrieval solves the chunk size tradeoff: small chunks match queries precisely, but lack surrounding context needed for good answers. This technique indexes small chunks for accurate retrieval, then returns their larger parent documents for generation.
The chunk size dilemma
When indexing documents, you face a trade-off:
Small chunks (128-256 tokens) Pros:
Precise semantic matching
Low false positive rate
Better retrieval accuracy
Cons:
Missing surrounding context
Incomplete information
Poor for generation
Large chunks (512-1024 tokens) Pros:
Rich context for generation
Complete information
Better for Q&A
Cons:
Noisy retrieval
Higher false positives
Worse precision
Parent document retrieval gives you the best of both worlds.
How it works
Split documents into parent and child chunks
Parents: Large chunks (512-1024 tokens) with full context
Children: Small chunks (128-256 tokens) for precise matching
Index only child chunks
Store child embeddings in vector database
Each child has metadata linking to its parent ID
Store parent documents separately
ParentDocumentStore maintains parent text in memory
Maps child IDs to parent IDs and full parent text
During search
Retrieve top-k child chunks from vector DB
Map child IDs to parent IDs
Return unique parent documents (deduplicated)
Basic usage
Indexing (LangChain)
Searching (LangChain)
from vectordb.langchain.parent_document_retrieval.indexing import (
PineconeParentDocumentRetrievalIndexingPipeline
)
pipeline = PineconeParentDocumentRetrievalIndexingPipeline(
"configs/pinecone_parent_doc.yaml"
)
# Load and index documents
pipeline.load_dataset()
pipeline.index_documents()
# ParentDocumentStore is automatically saved to disk
Configuration
pinecone :
api_key : ${PINECONE_API_KEY}
index_name : parent-child-index
namespace : default
dimension : 384
embedding :
provider : sentence_transformers
model : all-MiniLM-L6-v2
parent_document :
parent_chunk_size : 1000 # Tokens per parent document
parent_overlap : 100 # Overlap between parents
child_chunk_size : 200 # Tokens per child chunk
child_overlap : 20 # Overlap between children
parent_store :
cache_dir : ./cache # Where to save parent store
store_path : ./cache/parent_store.pkl # For loading in search
llm :
provider : groq
model : llama-3.3-70b-versatile
api_key : ${GROQ_API_KEY}
ParentDocumentStore
The parent store maintains chunk-to-parent mappings in memory:
from vectordb.langchain.parent_document_retrieval.parent_store import (
ParentDocumentStore
)
# Initialize with persistence
store = ParentDocumentStore( cache_dir = "./cache" )
# Add a parent document
store.add_parent(
parent_id = "parent_1" ,
parent_doc = {
"text" : "Complete document text with full context..." ,
"metadata" : { "source" : "article.txt" , "author" : "Jane" },
"source_index" : 0
}
)
# Map child chunks to this parent
for i in range ( 5 ):
store.add_chunk_mapping(
chunk_id = f "chunk_ { i } " ,
parent_id = "parent_1"
)
# Retrieve parent from any child
parent = store.get_parent( "chunk_2" )
print (parent[ "text" ]) # Full parent document
# Batch retrieval with deduplication
chunk_ids = [ "chunk_1" , "chunk_2" , "chunk_3" ] # May share parents
parents = store.get_parents_for_chunks(chunk_ids)
print ( f "Retrieved { len (parents) } unique parents" )
# Save to disk for later use
store.save( "parent_store.pkl" )
# Load from disk during search
loaded_store = ParentDocumentStore.load( "./cache/parent_store.pkl" )
Indexing pipeline internals
Here’s how the LangChain indexing pipeline works:
from langchain.text_splitter import RecursiveCharacterTextSplitter
import uuid
class ParentDocumentIndexingPipeline :
def __init__ ( self , config ):
# Initialize parent and child text splitters
self .parent_splitter = RecursiveCharacterTextSplitter(
chunk_size = config[ "parent_chunk_size" ],
chunk_overlap = config[ "parent_overlap" ]
)
self .child_splitter = RecursiveCharacterTextSplitter(
chunk_size = config[ "child_chunk_size" ],
chunk_overlap = config[ "child_overlap" ]
)
# Initialize parent store
cache_dir = config[ "parent_store" ][ "cache_dir" ]
self .parent_store = ParentDocumentStore( cache_dir = cache_dir)
def index_documents ( self , documents ):
all_child_chunks = []
for doc_idx, document in enumerate (documents):
# Split into parent chunks
parent_chunks = self .parent_splitter.split_text(document.content)
for parent_chunk in parent_chunks:
# Generate unique parent ID
parent_id = str (uuid.uuid4())
# Store parent in ParentDocumentStore
self .parent_store.add_parent(
parent_id = parent_id,
parent_doc = {
"text" : parent_chunk,
"metadata" : document.meta,
"source_index" : doc_idx
}
)
# Split parent into child chunks
child_chunks = self .child_splitter.split_text(parent_chunk)
for child_chunk in child_chunks:
# Generate unique child ID
child_id = str (uuid.uuid4())
# Map child to parent
self .parent_store.add_chunk_mapping(child_id, parent_id)
# Prepare child for indexing
all_child_chunks.append({
"id" : child_id,
"content" : child_chunk,
"metadata" : { "parent_id" : parent_id}
})
# Embed and index only child chunks
embeddings = self .embedder.embed_documents(all_child_chunks)
self .vector_db.index(all_child_chunks, embeddings)
# Save parent store to disk
self .parent_store.save( "parent_store.pkl" )
Search pipeline internals
How search retrieves children but returns parents:
class ParentDocumentSearchPipeline :
def __init__ ( self , config ):
# Load parent store from disk
store_path = config[ "parent_store" ][ "store_path" ]
self .parent_store = ParentDocumentStore.load(store_path)
def search ( self , query , top_k = 10 ):
# Embed query
query_embedding = self .embedder.embed_query(query)
# Search for child chunks (retrieve 2x for deduplication)
child_documents = self .vector_db.query(
query_embedding = query_embedding,
top_k = top_k * 2
)
# Extract child IDs from results
chunk_ids = [
doc.id if hasattr (doc, "id" ) else doc.metadata[ "id" ]
for doc in child_documents
]
# Map child IDs to parent documents (deduplicated)
parent_documents = self .parent_store.get_parents_for_chunks(
chunk_ids
)
# Limit to requested top_k
parent_documents = parent_documents[:top_k]
return { "parent_documents" : parent_documents, "query" : query}
Why over-fetch children?
The search pipeline retrieves top_k * 2 children because:
Multiple children may belong to the same parent
After deduplication, you might have fewer than top_k unique parents
Over-fetching ensures you have enough unique parents
Example:
# Retrieve 10 children, might get:
children = [
{ "id" : "c1" , "parent_id" : "p1" },
{ "id" : "c2" , "parent_id" : "p1" }, # Same parent as c1
{ "id" : "c3" , "parent_id" : "p2" },
{ "id" : "c4" , "parent_id" : "p1" }, # Same parent again
{ "id" : "c5" , "parent_id" : "p3" },
# ...
]
# After deduplication: only 3 unique parents (p1, p2, p3)
# Over-fetching compensates for this
Chunk size recommendations
General purpose
Technical docs
Short-form content
Long-form content
parent_chunk_size : 1000
parent_overlap : 100
child_chunk_size : 200
child_overlap : 20
Works well for articles, documentation, and general content. parent_chunk_size : 1500
parent_overlap : 150
child_chunk_size : 300
child_overlap : 30
Larger chunks preserve code blocks and technical context. parent_chunk_size : 600
parent_overlap : 50
child_chunk_size : 150
child_overlap : 15
For tweets, chat messages, or brief articles. parent_chunk_size : 2000
parent_overlap : 200
child_chunk_size : 400
child_overlap : 40
For books, research papers, or extensive reports.
Trade-offs
Child chunks are indexed in vector DB (normal storage)
Parent documents stored in ParentDocumentStore (in-memory pickle file)
Total storage: ~1.5-2x standard indexing
Mitigation: Parent store can be compressed or moved to Redis/database
ParentDocumentStore loads into memory during search
For 1M parents with 1KB text each: ~1GB RAM
Mitigation: Use database-backed parent store for production
Need to track which parents have been returned
Over-fetching required to ensure enough unique parents
Benefit: Handled automatically by ParentDocumentStore
Production considerations
Persist parent store
Save ParentDocumentStore to disk after indexing: store.save( "parent_store.pkl" )
Load during search: store = ParentDocumentStore.load( "./cache/parent_store.pkl" )
Scale parent storage
For large datasets, use a database instead of pickle: # Store parents in PostgreSQL or Redis
# Implement custom ParentDocumentStore with DB backend
Monitor deduplication rate
Track how many children map to unique parents: children_retrieved = 20
unique_parents = 8
dedup_rate = unique_parents / children_retrieved # 0.4
# If dedup_rate < 0.5, increase over-fetch multiplier
Combine with RAG
Use parent documents for answer generation: parent_docs = results[ "parent_documents" ]
parent_texts = [p[ "text" ] for p in parent_docs]
answer = llm.generate(query, context = parent_texts)
See also