Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Effectful-Tech/clanka/llms.txt

Use this file to discover all available pages before exploring further.

The CodeChunker module splits source files into semantically meaningful segments for embedding and search. For TypeScript and JavaScript files it uses tree-sitter to parse the AST and emit one chunk per top-level declaration (function, class, interface, etc.) along with structured metadata. All other supported file types fall back to sliding line-window chunking.

CodeChunk

export interface CodeChunk {
  readonly path: string
  readonly startLine: number
  readonly endLine: number
  readonly name: string | undefined
  readonly type: ChunkType | undefined
  readonly parent: string | undefined
  readonly content: string
}
A single chunk of source code extracted from a file.
path
string
required
Normalised, forward-slash path of the source file relative to the index root.
startLine
number
required
1-based line number of the first line of the chunk.
endLine
number
required
1-based line number of the last line of the chunk (inclusive).
content
string
required
The raw source text of the chunk.
name
string | undefined
Identifier extracted from the AST node — typically the function, class, method, or variable name. undefined for non-AST (line-window) chunks and for anonymous declarations.
type
ChunkType | undefined
Semantic category of the chunk as determined by the AST node type. undefined for non-AST chunks.
parent
string | undefined
The containing declaration formatted as "<type> <name>" (e.g. "class MyService"). Set for class methods and declarations inside namespaces. undefined for top-level chunks.

ChunkType

export type ChunkType =
  | "function"
  | "method"
  | "class"
  | "namespace"
  | "interface"
  | "type-alias"
  | "enum"
  | "variable"
Describes the AST category of a chunk. The values map directly to tree-sitter node types:
ChunkTypeTree-sitter node types
"function"function_declaration, generator_function_declaration, or a variable declarator whose value is a function expression
"method"method_definition, generator_method_definition
"class"class_declaration
"namespace"internal_module, module
"interface"interface_declaration
"type-alias"type_alias_declaration
"enum"enum_declaration
"variable"lexical_declaration, variable_declaration

CodeChunker service

export class CodeChunker extends Context.Service<
  CodeChunker,
  {
    listFiles(options: {
      readonly root: string
      readonly maxFileSize?: string | undefined
    }): Effect.Effect<ReadonlyArray<string>>

    chunkFile(options: {
      readonly root: string
      readonly path: string
      readonly chunkSize: number
      readonly chunkOverlap: number
      readonly chunkMaxCharacters?: number | undefined
    }): Effect.Effect<ReadonlyArray<CodeChunk>>

    chunkFiles(options: {
      readonly root: string
      readonly paths: ReadonlyArray<string>
      readonly chunkSize: number
      readonly chunkOverlap: number
      readonly chunkMaxCharacters?: number | undefined
    }): Stream.Stream<CodeChunk>

    chunkCodebase(options: {
      readonly root: string
      readonly maxFileSize?: string | undefined
      readonly chunkSize: number
      readonly chunkOverlap: number
      readonly chunkMaxCharacters?: number | undefined
    }): Stream.Stream<CodeChunk>
  }
>()("clanka/CodeChunker") {}

listFiles

Enumerate all indexable files under a root directory.
root
string
required
Absolute path to the root directory.
maxFileSize
string
Maximum file size passed to rg --max-filesize. Defaults to "1M".
Returns: Effect<ReadonlyArray<string>> — paths relative to root, sorted lexicographically. Files inside ignored directories (.git, node_modules, dist, etc.) and minified bundles are excluded automatically.

chunkFile

Chunk a single source file into an array of CodeChunk values.
root
string
required
Absolute path to the root directory. Used to resolve and normalise path.
path
string
required
Path to the file, relative to root.
chunkSize
number
required
Maximum number of lines per chunk for the sliding-window fallback.
chunkOverlap
number
required
Number of lines to overlap between adjacent sliding-window chunks.
chunkMaxCharacters
number
Maximum character count per chunk. Chunks that exceed this limit are split further.
Returns: Effect<ReadonlyArray<CodeChunk>>. Returns an empty array if the file does not exist, has no meaningful content, or appears to be minified.

chunkFiles

Chunk a list of files and stream results as they become available. Files are processed with a concurrency of 5.
root
string
required
Absolute path to the root directory.
paths
ReadonlyArray<string>
required
Paths of files to chunk, relative to root.
chunkSize
number
required
Maximum lines per sliding-window chunk.
chunkOverlap
number
required
Line overlap between adjacent sliding-window chunks.
chunkMaxCharacters
number
Maximum character count per chunk.
Returns: Stream<CodeChunk>.

chunkCodebase

Enumerate and chunk an entire codebase in one operation. Combines listFiles and chunkFiles.
root
string
required
Absolute path to the root directory.
maxFileSize
string
Maximum file size filter for enumeration. Defaults to "1M".
chunkSize
number
required
Maximum lines per sliding-window chunk.
chunkOverlap
number
required
Line overlap between adjacent sliding-window chunks.
chunkMaxCharacters
number
Maximum character count per chunk.
Returns: Stream<CodeChunk> — a continuous stream of chunks from all indexable files.

CodeChunker.layer

export const layer: Layer.Layer<
  CodeChunker,
  never,
  | ChildProcessSpawner.ChildProcessSpawner
  | FileSystem.FileSystem
  | Path.Path
>
Provides a CodeChunker implementation. This layer has no failure channel — errors from the underlying file system or spawner are surfaced through the service methods themselves. Requirements:
  • ChildProcessSpawner.ChildProcessSpawner — spawns the rg process used by listFiles.
  • FileSystem.FileSystem — reads file content in chunkFile.
  • Path.Path — resolves and normalises file paths.
SemanticSearch.layer provisions CodeChunker.layer automatically. You only need to provide CodeChunker.layer directly when using the chunker outside of SemanticSearch.

AST chunking

For files with a .ts, .tsx, .js, or .jsx extension, CodeChunker parses the file with tree-sitter and extracts top-level declarations as chunk boundaries. Chunking rules:
  • Each exported or top-level declaration becomes its own chunk.
  • Leading JSDoc/block comments immediately preceding a declaration are included in that chunk.
  • Class methods are emitted as separate child chunks with parent set to "class ClassName".
  • Declarations inside TypeScript namespaces/modules are emitted with parent set to "namespace Name".
  • A class chunk that has method child chunks and spans more than chunkSize lines is truncated to its first sliding-window segment only — the methods carry the rest of the content.
  • Minified files (detected heuristically by line count and line length) are skipped entirely.
Fallback: All other supported file types (Python, Go, Rust, Markdown, etc.) use a sliding line-window with the configured chunkSize and chunkOverlap.
Set chunkSize to 30 and chunkOverlap to 0 to match the defaults used by SemanticSearch.layer. Larger chunkSize values produce fewer but longer chunks, which can reduce search precision.

Supported file types

The chunker indexes files with the following extensions: Source code: c, cc, cpp, cs, css, cts, cxx, go, gql, graphql, h, hpp, html, ini, java, js, jsx, kt, kts, less, lua, mjs, mts, php, py, rb, rs, sass, scala, scss, sh, sql, svelte, swift, ts, tsx, vue, xml, zsh Documentation: adoc, asciidoc, md, mdx, rst, txt Files inside .git, .next, .nuxt, .svelte-kit, .turbo, build, coverage, dist, node_modules, and target directories are always excluded.

Utility functions

chunkFileContent

export const chunkFileContent: (
  path: string,
  content: string,
  options: {
    readonly chunkSize: number
    readonly chunkOverlap: number
    readonly chunkMaxCharacters?: number | undefined
  },
) => ReadonlyArray<CodeChunk>
A pure function that chunks an in-memory string without accessing the filesystem. Useful for testing or for pre-loaded file content. Returns an empty array if content is blank or heuristically detected as minified.

isProbablyMinified

export const isProbablyMinified: (content: string) => boolean
Returns true when the content is likely a minified bundle. A file is considered minified when it has fewer than 20 newlines and at least 80% of those lines are 300 or more characters long, and the total content is at least 2,000 characters.

isMeaningfulFile

export const isMeaningfulFile: (path: string) => boolean
Returns true when the file at path has a supported extension and does not reside inside an ignored directory. Path is matched in a case-insensitive, normalised (forward-slash) form.

Build docs developers (and LLMs) love