GitNexus uses hybrid search to find relevant code: it combines BM25 (keyword matching) and semantic search (embedding similarity), then merges results using Reciprocal Rank Fusion (RRF). This is the same approach used by production search systems like Elasticsearch, Pinecone, and Weaviate.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/abhigyanpatwari/GitNexus/llms.txt
Use this file to discover all available pages before exploring further.
Why Hybrid Search?
Neither keyword search nor semantic search is perfect alone:| Method | Strengths | Weaknesses |
|---|---|---|
| BM25 | Fast, exact matches, works for rare terms | Misses synonyms, semantic similarity |
| Semantic | Understands meaning, finds related concepts | Slower, may miss exact matches |
| Hybrid | Best of both — fast keyword + semantic understanding | ✅ |
Example: Searching for “authentication middleware” should find both:
- Files containing “auth” (keyword match)
- Files with similar concepts like “validateUser”, “checkToken” (semantic match)
Architecture
BM25 Search
BM25 (Best Match 25) is a probabilistic ranking algorithm for keyword-based search.Implementation
GitNexus uses KuzuDB’s built-in FTS (Full-Text Search) indexes:bm25-index.ts:60
Always fresh: KuzuDB FTS reads from the database on every query — no stale cached indexes.
BM25 Scoring
BM25 ranks documents using term frequency (TF) and inverse document frequency (IDF):- High scores: Documents with rare query terms that appear frequently
- Low scores: Documents with common terms that appear rarely
Semantic Search
Semantic search uses embedding vectors to find code with similar meaning, even if keywords don’t match exactly.Embedding Model
GitNexus uses snowflake-arctic-embed-xs by default:- 22M parameters
- 384 dimensions
- ~90MB model size
- GPU acceleration via DirectML (Windows) or CUDA (Linux)
embedder.ts:113
How embeddings work
How embeddings work
Each symbol (function, class, method) is converted to a 384-dimensional vector:Similar code produces similar vectors (measured by cosine similarity).
Vector Index
Embeddings are stored in KuzuDB as vector properties:Embedding generation is optional: Run
gitnexus analyze --skip-embeddings to index without embeddings (faster, BM25-only search).Reciprocal Rank Fusion (RRF)
RRF merges rankings from multiple sources without needing to normalize scores.Algorithm
For each result at rankr in a result set, compute:
k = 60 (standard constant). If a document appears in both BM25 and semantic results, sum its RRF scores.
hybrid-search.ts:46
Why RRF?
No score normalization needed
No score normalization needed
BM25 scores (0-∞) and cosine similarity (0-1) are on different scales. RRF uses rank position instead of raw scores, avoiding normalization issues.
Robust to outliers
Robust to outliers
A single high BM25 score won’t dominate the results. Rank position is more stable.
Simple and effective
Simple and effective
RRF is a one-line formula with a single parameter (
k = 60). It’s used in production by Elasticsearch, Pinecone, and others.Process-Grouped Search
GitNexus doesn’t just return a flat list of files. Results are grouped by process (execution flow) to provide architectural context.Example Output
Grouping Logic
- Run hybrid search to get relevant symbols
- Find processes that contain those symbols (via
STEP_IN_PROCESSedges) - Rank processes by relevance:
- Sum of RRF scores for symbols in the process
- Normalized by process step count
- Group results by process
Process-grouped search helps agents understand how features work, not just where they’re defined.
MCP Query Tool
The MCPquery tool uses hybrid search under the hood:
query(required) - Search query stringlimit(optional) - Max results (default: 10)repo(optional) - Repository name (required if multiple repos indexed)
processes- Execution flows related to the queryprocess_symbols- Symbols grouped by processdefinitions- Other relevant symbols not in processes
Performance
| Method | Latency | Memory |
|---|---|---|
| BM25 only | ~10ms | Minimal |
| Semantic only | ~50ms | ~200MB (model loaded) |
| Hybrid (RRF) | ~60ms | ~200MB |
GPU acceleration: Semantic search is 5-10x faster on GPU (DirectML/CUDA) compared to CPU.
Example: Searching for “auth”
BM25 Results
Semantic Results
RRF Merged Results
src/auth/index.ts gets the highest score because it appears in both result sets, showing it’s highly relevant by both keyword and semantic criteria.Customization
You can customize the embedding model during indexing:Next Steps
Knowledge Graph
Understand the graph schema
Processes & Flows
Learn how process-grouped search works