TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/Effectful-Tech/clanka/llms.txt
Use this file to discover all available pages before exploring further.
SemanticSearch service lets the agent find code by meaning rather than by pattern. It walks your codebase, splits files into AST-aware chunks using tree-sitter, generates embeddings for each chunk, and stores them in a local SQLite database. When the agent calls search("authentication middleware"), the query is embedded and the closest chunks are returned.
How indexing works
Tree-sitter chunking
The
CodeChunker service parses TypeScript and JavaScript files with tree-sitter. It splits each file at meaningful AST boundaries (functions, classes, methods) so that each chunk is a coherent unit of code. Chunks are annotated with their file path, symbol name, type, and parent context.Embedding generation
Each chunk is formatted with a YAML-style header (file, name, type, parent) followed by line-numbered source content, then sent to the configured embedding model. Requests are batched (default 300 per batch) to stay within API rate limits.
SQLite storage
Embeddings are stored as
Float32Array vectors in a SQLite database (default path: .clanka/search.sqlite). A syncId is assigned to each indexing run so stale chunks from deleted files can be pruned automatically at the end of the run.Layer configuration
| Option | Type | Default | Description |
|---|---|---|---|
directory | string | — | Root directory to index. Required. |
database | string | ".clanka/search.sqlite" | Path to the SQLite file that stores embeddings |
embeddingBatchSize | number | 300 | Maximum number of embedding requests per API call |
concurrency | number | 2000 | Maximum concurrent chunk-processing fibers |
chunkMaxCharacters | number | 10_000 | Maximum character length of a single chunk |
EmbeddingModel.EmbeddingModel— the embedding model to useEmbeddingModel.Dimensions— the vector dimensionality (must match the model)Path.Path,FileSystem.FileSystem,ChildProcessSpawner.ChildProcessSpawner
Incremental updates
When the agent writes or removes a file,SemanticSearch keeps the index consistent automatically — the built-in writeFile, removeFile, renameFile, and applyPatch tool handlers call updateFile and removeFile on the search index after each operation.
You can also drive these methods directly:
Full setup example
The following is derived fromexamples/cli.ts and shows a complete setup with OpenAI embeddings:
Search is provided to AgentExecutor, the search global becomes available inside every script the agent runs:
Searching directly
You can query the index outside of an agent turn:Requirements
SemanticSearch.layer requires the OPENAI_API_KEY environment variable when using OpenAiClient. The recommended embedding model is text-embedding-3-small with dimensions: 1536, which balances quality and cost.