Documentation Index
Fetch the complete documentation index at: https://mintlify.com/S1LV4/th0th/llms.txt
Use this file to discover all available pages before exploring further.
Overview
th0th implements hybrid semantic search that combines the best of vector similarity and keyword matching. This approach achieves higher accuracy than either method alone by using Reciprocal Rank Fusion (RRF) to merge results.98% token reduction is achieved by returning only the most relevant code chunks instead of entire files, dramatically reducing context size for AI assistants.
How It Works
Hybrid Retrieval Pipeline
Cache Lookup
Check L1 (memory) and L2 (SQLite) caches first. 50%+ cache hit rate on typical workloads.
Vector Search
Embedding Generation
Each code chunk is converted to a high-dimensional vector (embedding) that captures semantic meaning:Supported Embedding Models
Supported Embedding Models
| Provider | Model | Dimensions | Quality | Speed |
|---|---|---|---|---|
| Ollama | nomic-embed-text | 768 | Good | Very Fast |
| Ollama | bge-m3 | 1024 | Great | Fast |
| Mistral | mistral-embed | 1024 | Great | Medium |
| Mistral | codestral-embed | 1024 | Excellent | Medium |
| OpenAI | text-embedding-3-small | 1536 | Excellent | Medium |
Similarity Calculation
Vector search finds chunks with embeddings geometrically close to the query embedding:Keyword Search
BM25 Scoring (FTS5)
th0th uses SQLite’s FTS5 (Full-Text Search 5) with BM25 ranking:- Term frequency: How often the term appears in the document
- Document length: Shorter documents with matches rank higher
- Inverse document frequency: Rare terms are more valuable
Keyword search is essential for finding exact matches like function names, class names, or specific identifiers that embeddings might miss.
Reciprocal Rank Fusion (RRF)
The Algorithm
RRF combines rankings from multiple sources without needing to normalize scores:Why RRF Works
Score-Independent
Works with incompatible scoring systems (cosine similarity vs BM25)
Rank-Based
Focuses on relative ranking, not absolute scores
Empirically Proven
k=60 is optimal across diverse datasets (TREC research)
No Tuning Needed
Parameter-free for end users
Intelligent Boosting
th0th applies context-aware boosting for code-specific queries:Example: Code Query Boosting
Example: Code Query Boosting
Query:
cn() utility functionWithout boosting:- Vector: Documentation about utility functions (0.85)
- Vector: Similar helper code (0.82)
- Keyword: Exact
cn()definition (0.75)
- Keyword: Exact
cn()definition (1.88) ✨ - Vector: Documentation about utility functions (0.85)
- Vector: Similar helper code (0.82)
Smart Chunking
Language-Aware Splitting
th0th uses different chunking strategies based on file type:- Markdown
- JSON
- Code
- YAML
Split by headings with hierarchy context:Chunks:
Installation > Prerequisites(with heading context)Installation > Quick Start(with heading context)
Chunk Configuration
Multi-Level Caching
Two-Level Architecture
L1 Cache (Memory)
< 5ms lookup time100 most recent queriesLRU eviction
L2 Cache (SQLite)
< 20ms lookup time10,000 queries maxLRU eviction with indexes
Cache Key Generation
Cache keys are content-addressed using SHA256:Cache Invalidation
Full Project Invalidation
Full Project Invalidation
Triggered after complete reindexing:
File-Based Invalidation
File-Based Invalidation
More granular: only invalidate queries affected by changed files:
Performance Metrics
Search Latency
- Cache Hit (L1)
- Cache Hit (L2)
- Cache Miss
3-5msMemory lookup + JSON deserializationFastest path
Cache Hit Rate
Typical workloads achieve 50-70% cache hit rate due to repeated queries during development sessions.
Advanced Features
File Pattern Filters
Include/exclude results by glob patterns:Score Explanations
Debug ranking with detailed score breakdowns:Warmup Queries
Pre-populate cache after indexing:Best Practices
Query Writing
Be specific: Use function names, class names, or technical termsNatural language works: “how to hash passwords” finds relevant codeAvoid overly broad: “utils” returns too many results
Index Maintenance
Regular reindexing: Run after major code changesIncremental updates: th0th auto-detects stale indexesCache cleanup: Runs automatically (1-hour TTL)
Performance Tuning
Adjust maxResults: Lower = faster (default: 10)Use file filters: Narrow search scope for speedMonitor cache stats: Aim for 60%+ hit rate
Related Topics
Architecture
Overall system design and component interaction
Compression
Reduce token usage with intelligent compression
Memory
Long-term memory and pattern recognition
API Reference
Complete search API documentation