Configure chunk size, overlap, and other indexing settings
know provides several configuration options to optimize indexing for your content. Understanding these settings helps you balance search quality, performance, and storage.
The --chunk-size parameter controls the maximum size of each chunk in tokens (roughly equivalent to words).
# Default: 512 tokens (~400 words)know index# Smaller chunks for more precise resultsknow index --chunk-size 256# Larger chunks for more contextknow index --chunk-size 1024
The --overlap parameter controls how many tokens overlap between consecutive chunks.
# Default: 50 tokensknow index# No overlapknow index --overlap 0# More overlap for better contextknow index --overlap 100# Combine with chunk sizeknow index --chunk-size 512 --overlap 75
Why Use Chunk Overlap?
Overlap ensures that content at chunk boundaries isn’t lost or split awkwardly:Without overlap (—overlap 0):
Chunk 1: "...the function returns a value."Chunk 2: "The value is then processed by..."
Searching for “return value processed” might miss this!With overlap (—overlap 50):
Chunk 1: "...the function returns a value."Chunk 2: "...returns a value. The value is then processed by..."
Now both chunks contain the complete context.Recommended overlap:
10-20% of chunk size
Default 50 tokens works well with 512 chunk size (~10%)
# Show what would be prunedknow prune --dry# Actually pruneknow prune# Show detailsknow prune --log
# From src/db.py:528-582def prune(dry_run: bool = False, log: bool = False) -> tuple[int, int]: all_data = dense_collection.get(include=["metadatas"]) orphan_ids: list[str] = [] checked_paths: dict[str, bool] = {} # cache path existence checks for chunk_id, meta in zip(all_data["ids"], all_data["metadatas"]): path = meta.get("path", "") if not path: orphan_ids.append(chunk_id) continue if path not in checked_paths: checked_paths[path] = Path(path).exists() if not checked_paths[path]: orphan_ids.append(chunk_id)
# Default extensions (see SUPPORTED_EXTENSIONS)know index# Index only specific extensionsknow index --ext py --ext jsknow index --ext .md --ext .txt # Leading dot optionalknow index --ext "py,js,ts" # Comma-separated# Combine with other filtersknow index --ext py --glob "src/**" --since 7d