Skip to main content

General Questions

know is a semantic search CLI tool that indexes your local documents and code files, allowing you to search them using natural language queries. It combines:
  • Vector embeddings (via ChromaDB) for semantic similarity search
  • BM25 lexical search for exact term matching
  • Hybrid search using Reciprocal Rank Fusion (RRF) for best results
Documents are split into chunks, embedded, and stored locally. When you search, know retrieves the most relevant chunks and displays them with context.
No. know runs entirely locally on your machine:
  • All indexing happens locally using ChromaDB’s default embedding model (all-MiniLM-L6-v2)
  • No data is sent to external APIs or cloud services
  • Your documents and search queries remain private
  • All indexes are stored in ./know_index in your working directory
This makes know safe to use with sensitive codebases and private documents.
Index size depends on the amount of content you’re indexing:
  • Vector embeddings: ~1.5-2 KB per chunk (512 tokens)
  • BM25 cache: ~10-20% of your document corpus size
  • File cache: ~100 bytes per indexed file
  • ChromaDB overhead: 1-5% of total index size
Example: Indexing 10,000 documents (~50 MB of text) typically results in:
  • 20,000-30,000 chunks
  • 30-60 MB for vector embeddings
  • 5-10 MB for BM25 cache
  • Total: ~35-70 MB
Use du -sh ./know_index to check your index size.
Yes, but each project needs its own index. The index location is ./know_index relative to your current working directory.Option 1: Run know from different project directories
cd ~/project-a
know add .
know index
know search "query"

cd ~/project-b
know add .
know index
know search "query"
Option 2: Use a global index for all projects
mkdir ~/knowledge-base
cd ~/knowledge-base
know add ~/project-a
know add ~/project-b
know add ~/documents
know index
Each approach maintains separate indexes and watched directory lists.

Indexing Questions

Simply run know index again. know automatically detects changes:
know index
Incremental indexing:
  • Only new and modified files are processed
  • Unchanged files are skipped based on modification time and size
  • Existing chunks remain in the index
Force re-index everything:
know index --force
This clears the entire index and re-processes all files.
Deleted files remain in the index as “orphaned chunks” until you prune them:
# Check what would be removed
know prune --dry

# Remove orphaned chunks
know prune
The prune command:
  • Scans all indexed chunks
  • Checks if source files still exist
  • Removes chunks from deleted files
  • Cleans up the file cache
Tip: Run know prune periodically to keep your index clean and save disk space.
Use the --ext flag to filter by file extension:
# Index only Python files
know index --ext .py

# Index markdown and text files
know index --ext .md --ext .txt

# Comma-separated extensions
know index --ext .go,.rs,.zig
For more precise control, use glob patterns:
# Index only files in docs/ directory
know index --glob "docs/**"

# Index Python files in src/ only
know index --glob "src/**/*.py"
See Supported File Types for the full list of default extensions.
Yes, use the --since flag to index files modified after a certain time:
# Files modified in the last 7 days
know index --since 7d

# Last 12 hours
know index --since 12h

# Last 30 minutes
know index --since 30m

# Since specific date
know index --since 2024-01-15
This is useful for:
  • Quick updates after a work session
  • Indexing new files without re-processing everything
  • Testing on recent changes
Note: The file cache still applies, so unchanged files are skipped even if they match the time filter.
Chunk size determines how much text goes into each searchable unit:
  • Default: 512 tokens (~350-400 words)
  • Larger chunks: More context, but less precise results
  • Smaller chunks: More precise, but may lose context
Chunk overlap creates redundancy between adjacent chunks:
  • Default: 50 tokens
  • Prevents information from being split across chunk boundaries
  • Improves retrieval of content near chunk edges
Custom configuration:
know index --chunk-size 1024 --overlap 100
When to change:
  • Large chunk size (1024): For long-form documents where context is important
  • Small chunk size (256): For code search where precision matters
  • No overlap (0): To reduce index size (not recommended)
Changing chunk settings invalidates the file cache and requires re-indexing all files.
Files can be skipped for several reasons:
  1. Unchanged files: Already indexed and not modified (most common)
  2. Already indexed chunks: Same content already exists in the index
  3. Duplicate content: Identical chunks within the current batch
  4. Extension filter: File type not in allowed extensions
  5. Glob filter: File doesn’t match --glob patterns
  6. Time filter: File not modified since --since timestamp
To see detailed skip reasons:
know index --report skip_report.json
This generates a JSON report with:
  • Number of chunks added vs. skipped
  • Specific reason for each skipped chunk
  • Path collisions for duplicate content

Search Questions

Use glob patterns with the --glob flag:
# Search only markdown files
know search "query" --glob "**/*.md"

# Search in specific directory
know search "query" --glob "docs/**"

# Multiple patterns
know search "query" --glob "src/**/*.py" --glob "tests/**/*.py"

# Search by filename
know search "query" --glob "**/config.yaml"
Combine with time filters:
# Recent changes in Python files
know search "query" --glob "**/*.py" --since 7d
Try these strategies to improve result quality:1. Reduce the number of results
know search "query" --limit 3
2. Use more specific queries
❌ "database"
✓ "PostgreSQL connection pooling"
3. Try different search modes
# If dense search is too broad, try BM25
know search "specific_function_name" --bm25

# If BM25 is too narrow, try hybrid
know search "query" --hybrid
4. Filter by file type or location
know search "query" --glob "docs/**/*.md"
5. Use benchmark mode to compare
know search "query" --benchmark
This shows results from both dense and BM25, helping you choose the best mode.
Yes, know supports JSON output:Output to file:
know search "query" --json-out results.json
Output to stdout:
know search "query" --json
Plain text output:
know search "query" --plain
Pipe to other tools:
know search "query" --json | jq '.items[].meta.path' | sort | uniq
JSON structure:
{
  "query": "search query",
  "mode": "dense",
  "items": [
    {
      "doc": "filename\n\ncontent",
      "meta": {
        "path": "/full/path/to/file",
        "filename": "file.py",
        "extension": ".py",
        "chunk_index": 0
      },
      "distance": 0.123
    }
  ]
}
know automatically treats unknown commands as search queries:
# These are equivalent:
know search "error handling"
know "error handling"

# With flags:
know "authentication" --limit 3 --hybrid
This shortcut works unless the first word is a known command:
  • add, remove, index, search, prune, dirs, reset, --help

Troubleshooting

This means you haven’t indexed any documents yet.Solution:
# Add a directory to watch
know add /path/to/documents

# Index the directory
know index
Check watched directories:
know dirs
If no directories are listed, add them with know add.
This occurs when:
  1. No files match the extension filter
  2. No files match glob patterns
  3. No files modified since --since timestamp
  4. Directory is empty
Diagnosis:
# Check what extensions are being indexed
know index --log

# Try without filters
know index  # No --ext or --glob flags

# Check if files exist in the directory
ls -la /path/to/watched/directory
Solution: Adjust your filters or ensure the directory contains supported file types.
Large indexes can be caused by:
  1. Indexing binary files or large PDFs
  2. Not pruning deleted files
  3. Duplicate content from multiple sources
Solutions:1. Remove orphaned chunks:
know prune
2. Start fresh with specific file types:
know reset
know index --ext .md --ext .py --ext .txt
3. Use glob patterns to exclude large files:
know index --glob "**/*.md" --glob "**/*.py"
4. Check index size:
du -sh ./know_index
Symptoms:
  • First search after indexing takes a long time
  • BM25 search always shows “building index”
  • ./know_index/bm25/ directory is empty or missing
Causes:
  1. BM25 cache was deleted or corrupted
  2. Index is being modified between searches
  3. Very large index (>100k chunks)
Solutions:1. Let BM25 cache build once:
# First BM25 search will be slow but builds cache
know search "test" --bm25

# Subsequent searches will be fast
know search "actual query" --bm25
2. Verify cache exists:
ls -la ./know_index/bm25/
Should contain: meta.json, ids.json, and BM25 index files.3. Avoid frequent re-indexing: Index once, then use incremental updates:
know index --since 1h  # Only recent changes
Option 1: Clear the index only
know reset
This removes all indexed chunks but keeps your watched directories.Option 2: Clear everything
# Remove index and cache
rm -rf ./know_index

# Remove watched directories list
rm ~/.know_dirs

# Start fresh
know add /path/to/directory
know index
Option 3: Start a new project
# Use a different working directory
cd /path/to/new/project
know add .
know index
Each directory has its own ./know_index and ~/.know_dirs list.
Ensure all dependencies are installed:
pip install chromadb llama-index-core bm25s PyStemmer typer rich
Common issues:1. ChromaDB version conflicts:
pip install --upgrade chromadb
2. Missing PyStemmer:
pip install PyStemmer
3. llama-index namespace issues:
pip install llama-index-core
# Not: pip install llama-index
Check installed versions:
pip list | grep -E "chromadb|llama-index|bm25s|typer"

Performance Tips

1. Use incremental indexing:
know index --since 7d
2. Filter by file type:
know index --ext .md --ext .py
3. Exclude unnecessary directories: Don’t add node_modules, .git, or venv to watched directories.4. Adjust chunk size: Larger chunks = fewer chunks to process
know index --chunk-size 1024
5. Use dry run to test:
know index --dry
6. Monitor progress:
know index --log
1. Reduce candidate limit:
know search "query" --limit 5
2. Use dense search (fastest):
know search "query"  # Default, no extra flags
Dense search is faster than BM25 or hybrid because it uses ChromaDB’s optimized vector search.3. Filter early with globs:
know search "query" --glob "docs/**"
4. Keep BM25 cache warm: The first BM25/hybrid search after indexing is slow (builds cache). Subsequent searches are fast.5. Prune regularly:
know prune
Smaller indexes = faster searches.

Build docs developers (and LLMs) love