File Identification - HAI Build Code Generator

File Identification is HAI Build’s intelligent file discovery system. It combines fast indexing, fuzzy search, and context tracking to help the AI find exactly the files it needs for code generation.

How It Works

File Identification uses a multi-layer approach to file discovery:

Workspace Indexing

Ripgrep scans your workspace to build a comprehensive file index.

Active File Tracking

Monitors currently open tabs and recently edited files.

Fuzzy Search

Uses Fuse.js for intelligent file matching across paths and names.

Context Prioritization

Ranks files based on relevance to the current task.

Core Features

Lightning-Fast Indexing

File Identification leverages ripgrep for blazing-fast file discovery:

// From src/services/search/file-search.ts:17
export async function executeRipgrepForFiles(
  workspacePath: string,
  limit: number = 5000,
): Promise<{ path: string; type: "file" | "folder"; label?: string }[]>

Performance characteristics:

Scans 5,000+ files in milliseconds
Follows symlinks automatically
Respects .gitignore patterns
Excludes common build directories (node_modules, dist, .git)

Ripgrep is 10-100x faster than traditional file search tools, making it ideal for large codebases.

Intelligent File Filtering

Automatic exclusion of irrelevant directories:

// Excluded patterns (src/services/search/file-search.ts:30)
const excludePatterns = [
  "node_modules",
  ".git",
  ".github",
  "out",
  "dist",
  "__pycache__",
  ".venv",
  ".env",
  "venv",
  "env",
  ".cache",
  "tmp",
  "temp"
]

Active File Prioritization

Files you’re actively working on appear first:

// From src/services/search/file-search.ts:101
async function getActiveFiles(): Promise<Set<string>> {
  const request = GetOpenTabsRequest.create({})
  const response = await HostProvider.window.getOpenTabs(request)
  return new Set(response.paths)
}

Active files are:

Currently open in tabs
Recently edited
Visible in the editor
Part of the current diff view

The AI sees your open files first, making it context-aware of what you’re working on.

Fuzzy Search

File Identification uses Fuse.js for intelligent fuzzy matching:

Match Scoring
Gap Scoring

Files are ranked by multiple factors:

// From src/services/search/file-search.ts:159
const fzf = new fzfModule.Fzf(combinedItems, {
  selector: (item) => `${item.label} ${item.label} ${item.path}`,
  tiebreakers: [OrderbyMatchScore, fzfModule.byLengthAsc],
  limit: limit * 2,
})

Scoring criteria:

Filename matches (weighted 2x)
Full path matches
Fewer gaps between matched characters
Shorter paths preferred

Custom algorithm prioritizes contiguous matches:

// From src/services/search/file-search.ts:192
export const OrderbyMatchScore = (a, b) => {
  const countGaps = (positions) => {
    let gaps = 0, prev = -Infinity
    for (const pos of positions) {
      if (prev !== -Infinity && pos - prev > 1) {
        gaps++
      }
      prev = pos
    }
    return gaps
  }
  return countGaps(a.positions) - countGaps(b.positions)
}

Example:

UserController.ts vs UsersComponent.tsx
Query: UserCont
UserController.ts ranks higher (fewer gaps)

Multi-Root Workspace Support

File Identification seamlessly handles multi-root workspaces:

// From src/services/search/file-search.ts:213
export async function searchWorkspaceFilesMultiroot(
  query: string,
  workspaceManager: WorkspaceRootManager,
  limit: number = 20,
  selectedType?: "file" | "folder",
  workspaceHint?: string,
): Promise<Result[]>

Workspace Hints

Search specific workspace roots:@frontend:/componentsSearches only the frontend workspace.

Parallel Search

Searches all roots simultaneously:

const searchPromises = workspacesToSearch.map(
  async (workspace) => {
    return await searchWorkspaceFiles(
      query,
      workspace.path,
      limit,
      selectedType,
      workspace.name
    )
  }
)

Deduplication

Handles duplicate filenames:

if (pathCounts.get(result.path)! > 1) {
  return {
    ...result,
    label: `${result.workspaceName}:/${result.path}`
  }
}

Workspace Labels

Each result includes workspace name:

{
  path: "src/components/Button.tsx",
  workspaceName: "frontend",
  type: "file"
}

See: src/services/search/file-search.ts:209

File Context Tracking

HAI Build tracks which files are in the AI’s context:

Context Status Indicators

Added to Context

Files explicitly added via mentions or tool use.

In Active Use

Files currently being read or edited by the AI.

Removed from Context

Files removed during context compaction.

File Mention Tracking

Every file operation is tracked:

// From src/core/context/context-tracking/FileContextTracker.ts
export class FileContextTracker {
  async trackFileAccess(filePath: string, operation: 'read' | 'write' | 'list')
  async getAccessedFiles(): Promise<Set<string>>
  async clearAccessHistory(): Promise<void>
}

Tracked operations:

read_file: File content read
write_to_file: File created or modified
replace_in_file: File edited
list_files: Directory listing
search_files: File search results

Context Warnings

Users are warned when files are removed:

// From src/core/task/index.ts:1248
const pendingContextWarning = 
  await this.fileContextTracker.retrieveAndClearPendingFileContextWarning()

if (pendingContextWarning && pendingContextWarning.length > 0) {
  const fileContextWarning = formatResponse.fileContextWarning(pendingContextWarning)
  newUserContent.push({
    type: "text",
    text: fileContextWarning,
  })
}

When context is compacted, the AI receives a warning about removed files to maintain awareness.

File Type Detection

Automatic file type detection and categorization:

Type Verification

Verifies file types using fs.lstat():

// From src/services/search/file-search.ts:168
const verifiedResultsPromises = filteredResults.map(
  async ({ item }) => {
    const fullPath = path.join(workspacePath, item.path)
    let type = item.type
    
    try {
      const stats = await fs.promises.lstat(fullPath)
      type = stats.isDirectory() ? "folder" : "file"
    } catch {
      // Keep original type if path doesn't exist
    }
    
    return { ...item, type }
  },
)

Directory Discovery

Automatically builds directory tree:

// From src/services/search/file-search.ts:61
let dirPath = path.dirname(relativePath)
while (dirPath && dirPath !== "." && dirPath !== "/") {
  dirSet.add(dirPath)
  dirPath = path.dirname(dirPath)
}

Enables folder-level operations and navigation.

Search Strategies

Different search approaches for different scenarios:

Exact Match

When you know the filename:UserService.tsReturns exact matches first.

Partial Match

When you remember part of the name:user servFuzzy matches across path segments.

Path-Based

When you know the directory:components/authSearches full paths.

Extension Filter

When you need specific file types:*.test.tsFilters by extension pattern.

Integration with AI

Automatic File Suggestions

The AI uses File Identification to:

Find relevant files for a task
Suggest imports based on available modules
Locate configuration files automatically
Discover test files for testing tasks
Navigate project structure efficiently

Context-Aware Mentions

File Identification powers the @ mention feature:

// Type @ in chat to trigger file search
@src/components/Button
// AI adds file to context

Use @ mentions to explicitly add files to context before describing your task.

Performance Optimization

Lazy Loading

Results are loaded incrementally:

Initial 20 results displayed immediately
More results loaded on scroll
Prevents UI blocking on large workspaces

Caching Strategy

File index is cached per workspace:

Cache invalidated on file system changes
Active files cache updated every 500ms
Search results cached for 5 seconds

Limits and Throttling

// Default limits (src/services/search/file-search.ts:18)
const DEFAULT_FILE_LIMIT = 5000
const DEFAULT_SEARCH_RESULTS = 20
const EXTENDED_RESULTS = 40  // For multi-root

Prevents memory issues on massive codebases.

Configuration

Custom Exclusions

Add workspace-specific exclusions:

// .vscode/settings.json
{
  "hai.fileSearch.exclude": [
    "**/build/**",
    "**/coverage/**",
    "**/*.log"
  ]
}

Search Behavior

Customize search parameters:

{
  "hai.fileSearch.limit": 10000,
  "hai.fileSearch.fuzzyThreshold": 0.3,
  "hai.fileSearch.followSymlinks": true
}

Best Practices

Use Descriptive Filenames

Clear names improve search accuracy
Include feature/component in filename
Use consistent naming conventions
Avoid generic names like utils.ts

Organize with Directories

Group related files together
Use feature-based folder structure
Keep directory depth reasonable (3-5 levels)
Mirror logical application structure

Leverage Active File Priority

Open relevant files before starting AI tasks
Keep related files in tabs
Use split editor for context files
Close unrelated files to reduce noise

Optimize Large Workspaces

Use workspace hints in multi-root setups
Add irrelevant directories to exclusions
Increase file limit if needed
Consider breaking into multiple workspaces

Troubleshooting

Files not appearing in search

Check exclusion patterns
Verify file isn’t in .gitignore
Confirm file exists on disk
Refresh file index

Slow file search

Reduce file limit
Add more exclusion patterns
Check for symlink loops
Disable deep directory scanning

Incorrect file rankings

Use more specific search terms
Include path segments in query
Adjust fuzzy search threshold
Try exact filename match

Next Steps

Inline Editing

Make quick edits to discovered files

AI-Powered Coding

See how the AI uses file context

Focus Chain

Track which files are being worked on

Multi-Root Workspaces

Work with multiple project roots

Get Started

Core Features

Configuration

Usage Guides

Advanced

Documentation Index

​How It Works

​Core Features

​Lightning-Fast Indexing

​Intelligent File Filtering

​Active File Prioritization

​Fuzzy Search

​Multi-Root Workspace Support

Workspace Hints

Parallel Search

Deduplication

Workspace Labels

​File Context Tracking

​Context Status Indicators

​File Mention Tracking

​Context Warnings

​File Type Detection

​Search Strategies

Exact Match

Partial Match

Path-Based

Extension Filter

​Integration with AI

​Automatic File Suggestions

​Context-Aware Mentions

​Performance Optimization

​Lazy Loading

​Caching Strategy

​Limits and Throttling

​Configuration

​Custom Exclusions

​Search Behavior

​Best Practices

​Troubleshooting

​Next Steps

Inline Editing

AI-Powered Coding

Focus Chain

Multi-Root Workspaces

Build docs developers (and LLMs) love

How It Works

Core Features

Lightning-Fast Indexing

Intelligent File Filtering

Active File Prioritization

Fuzzy Search

Multi-Root Workspace Support

File Context Tracking

Context Status Indicators

File Mention Tracking

Context Warnings

File Type Detection

Search Strategies

Integration with AI

Automatic File Suggestions

Context-Aware Mentions

Performance Optimization

Lazy Loading

Caching Strategy

Limits and Throttling

Configuration

Custom Exclusions

Search Behavior

Best Practices

Troubleshooting

Next Steps