Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Effectful-Tech/clanka/llms.txt

Use this file to discover all available pages before exploring further.

The SemanticSearch service lets the agent find code by meaning rather than by pattern. It walks your codebase, splits files into AST-aware chunks using tree-sitter, generates embeddings for each chunk, and stores them in a local SQLite database. When the agent calls search("authentication middleware"), the query is embedded and the closest chunks are returned.

How indexing works

1

Tree-sitter chunking

The CodeChunker service parses TypeScript and JavaScript files with tree-sitter. It splits each file at meaningful AST boundaries (functions, classes, methods) so that each chunk is a coherent unit of code. Chunks are annotated with their file path, symbol name, type, and parent context.
2

Embedding generation

Each chunk is formatted with a YAML-style header (file, name, type, parent) followed by line-numbered source content, then sent to the configured embedding model. Requests are batched (default 300 per batch) to stay within API rate limits.
3

SQLite storage

Embeddings are stored as Float32Array vectors in a SQLite database (default path: .clanka/search.sqlite). A syncId is assigned to each indexing run so stale chunks from deleted files can be pruned automatically at the end of the run.
4

Background re-indexing

After the initial index is complete, re-indexing runs every 3 minutes in the background via a FiberHandle. Each run checks whether a chunk’s hash has changed before re-embedding, so unchanged code is never re-sent to the API.

Layer configuration

import { SemanticSearch } from "clanka"
import { Layer } from "effect"

const Search = SemanticSearch.layer({
  directory: process.cwd(),         // root of the codebase to index
  database: ".clanka/search.sqlite", // SQLite database path (optional)
  embeddingBatchSize: 300,           // requests per batch (optional)
  concurrency: 2000,                 // concurrent embedding requests (optional)
  chunkMaxCharacters: 10_000,        // max chars per chunk (optional)
})
OptionTypeDefaultDescription
directorystringRoot directory to index. Required.
databasestring".clanka/search.sqlite"Path to the SQLite file that stores embeddings
embeddingBatchSizenumber300Maximum number of embedding requests per API call
concurrencynumber2000Maximum concurrent chunk-processing fibers
chunkMaxCharactersnumber10_000Maximum character length of a single chunk
The layer requires the following services in context:
  • EmbeddingModel.EmbeddingModel — the embedding model to use
  • EmbeddingModel.Dimensions — the vector dimensionality (must match the model)
  • Path.Path, FileSystem.FileSystem, ChildProcessSpawner.ChildProcessSpawner

Incremental updates

When the agent writes or removes a file, SemanticSearch keeps the index consistent automatically — the built-in writeFile, removeFile, renameFile, and applyPatch tool handlers call updateFile and removeFile on the search index after each operation. You can also drive these methods directly:
import { SemanticSearch } from "clanka"
import { Effect } from "effect"

Effect.gen(function* () {
  const ss = yield* SemanticSearch.SemanticSearch

  // Re-embed a file after you modify it
  yield* ss.updateFile("src/auth/middleware.ts")

  // Remove a file's chunks when the file is deleted
  yield* ss.removeFile("src/legacy/oldModule.ts")
})
Both methods wait for the initial index to finish before running, so they are safe to call at any point after the layer is provided.

Full setup example

The following is derived from examples/cli.ts and shows a complete setup with OpenAI embeddings:
import { Config, Effect, Layer, Stream } from "effect"
import { Agent, SemanticSearch } from "clanka"
import { NodeHttpClient, NodeRuntime, NodeServices } from "@effect/platform-node"
import { OpenAiClient, OpenAiEmbeddingModel } from "@effect/ai-openai"

const Search = SemanticSearch.layer({
  directory: process.cwd(),
  database: ".clanka/search.sqlite",
}).pipe(
  Layer.provide(
    OpenAiEmbeddingModel.model("text-embedding-3-small", {
      dimensions: 1536,
    }),
  ),
  Layer.provide(
    OpenAiClient.layerConfig({
      apiKey: Config.redacted("OPENAI_API_KEY"),
    }),
  ),
  Layer.provide(NodeHttpClient.layerUndici),
  Layer.provide(NodeServices.layer),
)

const AgentLayer = Agent.layerLocal({
  directory: process.cwd(),
}).pipe(
  Layer.provide(NodeServices.layer),
  Layer.provide(NodeHttpClient.layerUndici),
  Layer.provide(Search), // providing Search makes `search()` available in the sandbox
)
Once Search is provided to AgentExecutor, the search global becomes available inside every script the agent runs:
// Inside a model-generated script
const results = await search("user authentication token validation")
console.log(results)

Searching directly

You can query the index outside of an agent turn:
import { SemanticSearch } from "clanka"
import { Effect } from "effect"

Effect.gen(function* () {
  const ss = yield* SemanticSearch.SemanticSearch

  const results = yield* ss.search({
    query: "database connection pooling",
    limit: 10,
  })

  console.log(results) // top-10 matching chunks joined by newlines
}).pipe(Effect.provide(Search), NodeRuntime.runMain)

Requirements

SemanticSearch.layer requires the OPENAI_API_KEY environment variable when using OpenAiClient. The recommended embedding model is text-embedding-3-small with dimensions: 1536, which balances quality and cost.
Index only the source files your agent needs. If your repository is large, set a tighter chunkMaxCharacters value (e.g., 3000) to keep individual chunks focused and retrieval precise.

Build docs developers (and LLMs) love