General Questions
What is know and how does it work?
What is know and how does it work?
- Vector embeddings (via ChromaDB) for semantic similarity search
- BM25 lexical search for exact term matching
- Hybrid search using Reciprocal Rank Fusion (RRF) for best results
Is my data sent to any external servers?
Is my data sent to any external servers?
- All indexing happens locally using ChromaDB’s default embedding model (all-MiniLM-L6-v2)
- No data is sent to external APIs or cloud services
- Your documents and search queries remain private
- All indexes are stored in
./know_indexin your working directory
How much disk space does the index require?
How much disk space does the index require?
- Vector embeddings: ~1.5-2 KB per chunk (512 tokens)
- BM25 cache: ~10-20% of your document corpus size
- File cache: ~100 bytes per indexed file
- ChromaDB overhead: 1-5% of total index size
- 20,000-30,000 chunks
- 30-60 MB for vector embeddings
- 5-10 MB for BM25 cache
- Total: ~35-70 MB
du -sh ./know_index to check your index size.Can I use know with multiple projects?
Can I use know with multiple projects?
./know_index relative to your current working directory.Option 1: Run know from different project directoriesIndexing Questions
How do I update the index after changing files?
How do I update the index after changing files?
know index again. know automatically detects changes:- Only new and modified files are processed
- Unchanged files are skipped based on modification time and size
- Existing chunks remain in the index
What happens to deleted files in the index?
What happens to deleted files in the index?
prune command:- Scans all indexed chunks
- Checks if source files still exist
- Removes chunks from deleted files
- Cleans up the file cache
know prune periodically to keep your index clean and save disk space.How do I index only specific file types?
How do I index only specific file types?
--ext flag to filter by file extension:Can I index recently modified files only?
Can I index recently modified files only?
--since flag to index files modified after a certain time:- Quick updates after a work session
- Indexing new files without re-processing everything
- Testing on recent changes
What are chunk size and overlap, and should I change them?
What are chunk size and overlap, and should I change them?
- Default: 512 tokens (~350-400 words)
- Larger chunks: More context, but less precise results
- Smaller chunks: More precise, but may lose context
- Default: 50 tokens
- Prevents information from being split across chunk boundaries
- Improves retrieval of content near chunk edges
- Large chunk size (1024): For long-form documents where context is important
- Small chunk size (256): For code search where precision matters
- No overlap (0): To reduce index size (not recommended)
Why are some files being skipped during indexing?
Why are some files being skipped during indexing?
- Unchanged files: Already indexed and not modified (most common)
- Already indexed chunks: Same content already exists in the index
- Duplicate content: Identical chunks within the current batch
- Extension filter: File type not in allowed extensions
- Glob filter: File doesn’t match
--globpatterns - Time filter: File not modified since
--sincetimestamp
- Number of chunks added vs. skipped
- Specific reason for each skipped chunk
- Path collisions for duplicate content
Search Questions
When should I use dense, BM25, or hybrid search?
When should I use dense, BM25, or hybrid search?
- Best for: Conceptual queries, natural language questions
- Finds semantically similar content even with different wording
- Use when: Looking for concepts, explanations, or related topics
- Best for: Exact terms, function names, error messages
- Ranks by term frequency and rarity
- Use when: Searching for specific identifiers or technical terms
- Best for: General purpose, balanced results
- Combines semantic understanding with keyword matching
- Use when: Unsure which mode to use, or want comprehensive results
How do I search within specific files or directories?
How do I search within specific files or directories?
--glob flag:Why am I getting too many or irrelevant results?
Why am I getting too many or irrelevant results?
Can I export search results to use in other tools?
Can I export search results to use in other tools?
How do I search without typing 'search' every time?
How do I search without typing 'search' every time?
add,remove,index,search,prune,dirs,reset,--help
Troubleshooting
Error: 'No index found'
Error: 'No index found'
know add.Error: 'No documents found'
Error: 'No documents found'
- No files match the extension filter
- No files match glob patterns
- No files modified since
--sincetimestamp - Directory is empty
Index is growing too large
Index is growing too large
- Indexing binary files or large PDFs
- Not pruning deleted files
- Duplicate content from multiple sources
Search is slow or BM25 index keeps rebuilding
Search is slow or BM25 index keeps rebuilding
- First search after indexing takes a long time
- BM25 search always shows “building index”
./know_index/bm25/directory is empty or missing
- BM25 cache was deleted or corrupted
- Index is being modified between searches
- Very large index (>100k chunks)
meta.json, ids.json, and BM25 index files.3. Avoid frequent re-indexing:
Index once, then use incremental updates:How do I completely reset and start over?
How do I completely reset and start over?
./know_index and ~/.know_dirs list.Getting import or dependency errors
Getting import or dependency errors
Performance Tips
How can I make indexing faster?
How can I make indexing faster?
node_modules, .git, or venv to watched directories.4. Adjust chunk size:
Larger chunks = fewer chunks to processHow can I make search faster?
How can I make search faster?
Related Documentation
- Commands Reference - Complete command documentation
- Architecture - System design and data flow
- Supported File Types - Indexable file extensions