Semantic search: embedding-powered code search

The SemanticSearch service lets the agent find code by meaning rather than by pattern. It walks your codebase, splits files into AST-aware chunks using tree-sitter, generates embeddings for each chunk, and stores them in a local SQLite database. When the agent calls search("authentication middleware"), the query is embedded and the closest chunks are returned.

How indexing works

Tree-sitter chunking

The CodeChunker service parses TypeScript and JavaScript files with tree-sitter. It splits each file at meaningful AST boundaries (functions, classes, methods) so that each chunk is a coherent unit of code. Chunks are annotated with their file path, symbol name, type, and parent context.

Embedding generation

Each chunk is formatted with a YAML-style header (file, name, type, parent) followed by line-numbered source content, then sent to the configured embedding model. Requests are batched (default 300 per batch) to stay within API rate limits.

SQLite storage

Embeddings are stored as Float32Array vectors in a SQLite database (default path: .clanka/search.sqlite). A syncId is assigned to each indexing run so stale chunks from deleted files can be pruned automatically at the end of the run.

Background re-indexing

After the initial index is complete, re-indexing runs every 3 minutes in the background via a FiberHandle. Each run checks whether a chunk’s hash has changed before re-embedding, so unchanged code is never re-sent to the API.

Layer configuration

import { SemanticSearch } from "clanka"
import { Layer } from "effect"

const Search = SemanticSearch.layer({
  directory: process.cwd(),         // root of the codebase to index
  database: ".clanka/search.sqlite", // SQLite database path (optional)
  embeddingBatchSize: 300,           // requests per batch (optional)
  concurrency: 2000,                 // concurrent embedding requests (optional)
  chunkMaxCharacters: 10_000,        // max chars per chunk (optional)
})

Option	Type	Default	Description
`directory`	`string`	—	Root directory to index. Required.
`database`	`string`	`".clanka/search.sqlite"`	Path to the SQLite file that stores embeddings
`embeddingBatchSize`	`number`	`300`	Maximum number of embedding requests per API call
`concurrency`	`number`	`2000`	Maximum concurrent chunk-processing fibers
`chunkMaxCharacters`	`number`	`10_000`	Maximum character length of a single chunk

The layer requires the following services in context:

EmbeddingModel.EmbeddingModel — the embedding model to use
EmbeddingModel.Dimensions — the vector dimensionality (must match the model)
Path.Path, FileSystem.FileSystem, ChildProcessSpawner.ChildProcessSpawner

Incremental updates

When the agent writes or removes a file, SemanticSearch keeps the index consistent automatically — the built-in writeFile, removeFile, renameFile, and applyPatch tool handlers call updateFile and removeFile on the search index after each operation. You can also drive these methods directly:

import { SemanticSearch } from "clanka"
import { Effect } from "effect"

Effect.gen(function* () {
  const ss = yield* SemanticSearch.SemanticSearch

  // Re-embed a file after you modify it
  yield* ss.updateFile("src/auth/middleware.ts")

  // Remove a file's chunks when the file is deleted
  yield* ss.removeFile("src/legacy/oldModule.ts")
})

Both methods wait for the initial index to finish before running, so they are safe to call at any point after the layer is provided.

Full setup example

The following is derived from examples/cli.ts and shows a complete setup with OpenAI embeddings:

import { Config, Effect, Layer, Stream } from "effect"
import { Agent, SemanticSearch } from "clanka"
import { NodeHttpClient, NodeRuntime, NodeServices } from "@effect/platform-node"
import { OpenAiClient, OpenAiEmbeddingModel } from "@effect/ai-openai"

const Search = SemanticSearch.layer({
  directory: process.cwd(),
  database: ".clanka/search.sqlite",
}).pipe(
  Layer.provide(
    OpenAiEmbeddingModel.model("text-embedding-3-small", {
      dimensions: 1536,
    }),
  ),
  Layer.provide(
    OpenAiClient.layerConfig({
      apiKey: Config.redacted("OPENAI_API_KEY"),
    }),
  ),
  Layer.provide(NodeHttpClient.layerUndici),
  Layer.provide(NodeServices.layer),
)

const AgentLayer = Agent.layerLocal({
  directory: process.cwd(),
}).pipe(
  Layer.provide(NodeServices.layer),
  Layer.provide(NodeHttpClient.layerUndici),
  Layer.provide(Search), // providing Search makes `search()` available in the sandbox
)

Once Search is provided to AgentExecutor, the search global becomes available inside every script the agent runs:

// Inside a model-generated script
const results = await search("user authentication token validation")
console.log(results)

Searching directly

You can query the index outside of an agent turn:

import { SemanticSearch } from "clanka"
import { Effect } from "effect"

Effect.gen(function* () {
  const ss = yield* SemanticSearch.SemanticSearch

  const results = yield* ss.search({
    query: "database connection pooling",
    limit: 10,
  })

  console.log(results) // top-10 matching chunks joined by newlines
}).pipe(Effect.provide(Search), NodeRuntime.runMain)

Requirements

SemanticSearch.layer requires the OPENAI_API_KEY environment variable when using OpenAiClient. The recommended embedding model is text-embedding-3-small with dimensions: 1536, which balances quality and cost.

Index only the source files your agent needs. If your repository is large, set a tighter chunkMaxCharacters value (e.g., 3000) to keep individual chunks focused and retrieval precise.

Get Started

Core Concepts

Providers

Guides

Semantic search: embedding-powered code search

How indexing works

Layer configuration

Incremental updates

Full setup example

Searching directly

Requirements

Build docs developers (and LLMs) love

Get Started

Core Concepts

Providers

Guides

Documentation Index

​How indexing works

​Layer configuration

​Incremental updates

​Full setup example

​Searching directly

​Requirements

Build docs developers (and LLMs) love

How indexing works

Layer configuration

Incremental updates

Full setup example

Searching directly

Requirements