Overview
know can index and search a wide range of document and code file types. The tool uses llama-index’sSimpleDirectoryReader to load files based on their extensions.
Supported Extensions
Document Files
know supports common document formats for knowledge management and documentation:.md- Markdown files.txt- Plain text files.pdf- PDF documents.docx- Microsoft Word documents.pptx- Microsoft PowerPoint presentations.html- HTML files
Code Files
know indexes source code files across multiple programming languages:.py- Python.js- JavaScript.ts- TypeScript.jsx- JavaScript React.tsx- TypeScript React.go- Go.rs- Rust.java- Java.c- C.cpp- C++.h- C/C++ header files.hpp- C++ header files.rb- Ruby.sh- Shell scripts.lua- Lua.swift- Swift
Custom Extensions
You can index files with custom extensions using the--ext flag:
Filtering by Extension
When indexing, you can limit which file types to process:File Selection
Default Behavior
By default,know index processes all supported file types in watched directories.
Using Glob Patterns
For more precise control, use glob patterns to include specific files:Technical Details
File Loading
Files are loaded using llama-index’sSimpleDirectoryReader with the following configuration:
- Recursive scanning: Enabled by default (disable with
--no-recursive) - Filename as ID: File paths are used as document identifiers
- Extension filtering: Only files matching
required_extsare processed
File Cache
know tracks file modification times and sizes to avoid re-indexing unchanged files:- Cache location:
./know_index/file_cache.json - Cache invalidation: Files are re-indexed if
mtimeor size changes - Chunk configuration: Cache is tied to chunk size and overlap settings
Changing
--chunk-size or --overlap settings will invalidate the file cache and require re-indexing all files.Related Commands
- know index - Index files from watched directories
- know search - Search indexed documents with glob filters