Codebase Memory MCP is a high-performance code intelligence engine that transforms any codebase into a persistent, queryable knowledge graph — giving AI coding agents structured answers about your code instead of forcing them to read every file from scratch. Built as a single static binary in pure C with zero runtime dependencies, it ships with 158 vendored tree-sitter grammars, Hybrid LSP semantic type resolution for 11 languages, and 14 MCP tools covering everything from call-chain tracing to dead code detection. Install it once, and every agent you use — Claude Code, Codex CLI, Gemini CLI, Zed, and eight others — gets a persistent, sub-millisecond index of your entire codebase.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/DeusData/codebase-memory-mcp/llms.txt
Use this file to discover all available pages before exploring further.
Why Codebase Memory MCP
When an AI agent needs to answer “what callsProcessOrder?”, the naive approach is to grep through files one by one, reading thousands of lines of source code and burning through your context window. Codebase Memory MCP replaces that file-by-file trawl with a single structured graph query that returns the exact answer in under a millisecond.
The difference in token consumption is stark:
| Exploration method | Token usage (5 structural queries) |
|---|---|
| File-by-file grep/read | ~412,000 tokens |
| Codebase Memory MCP graph queries | ~3,400 tokens |
| Reduction | 99.2% fewer tokens |
Key Capabilities
158 Languages
All 158 tree-sitter grammars are vendored and compiled directly into the binary. Nothing to install, nothing that breaks. Excellent tier coverage for C, C++, Python, TypeScript, Go, Rust, Kotlin, Lua, and more.
Hybrid LSP
A lightweight C implementation of type-resolution algorithms — structurally compatible with tsserver, pyright, gopls, Roslyn, Eclipse JDT, and rust-analyzer — runs alongside tree-sitter to produce accurate cross-package
CALLS edges with no language server process.14 MCP Tools
search_graph, trace_path, get_architecture, detect_changes, query_graph, semantic_query, get_code_snippet, manage_adr, and more — covering search, traversal, impact analysis, and Cypher-like queries.Zero Dependencies
Single static binary for macOS (arm64/amd64), Linux (arm64/amd64), and Windows (amd64). No Docker, no runtime, no API keys. Download →
install → restart agent → done.99% Fewer Tokens
RAM-first pipeline with LZ4 compression indexes the Linux kernel (28M LOC, 75K files) in 3 minutes. Cypher queries return in under 1 ms. Five structural queries cost ~3,400 tokens vs ~412,000 via grep.
11 Agent Support
install auto-detects Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, and Kiro — configuring MCP entries, instruction files, and pre-tool hooks for each automatically.How It Works
Codebase Memory MCP is a structural analysis backend — it builds and queries the knowledge graph. It does not include an LLM. Instead, it relies on your MCP client (Claude Code, or any MCP-compatible agent) to act as the intelligence layer that translates your natural language into tool calls.Example: Tracing a call path
The indexing pipeline
Indexing runs in two passes per file:- Tree-sitter pass — fast, syntactic, covers all 158 languages. Extracts definitions, calls, and imports into an in-memory SQLite graph using LZ4 compression.
- Hybrid LSP pass — type-aware, for Python, TypeScript/JavaScript/JSX/TSX, PHP, C#, Go, C/C++, Java, Kotlin, and Rust. Refines call edges using the import graph and a per-file or cross-file definition registry, mirroring what an IDE “Go to Definition” would resolve.
~/.cache/codebase-memory-mcp/ in WAL-mode SQLite. A background watcher detects file changes via git polling and re-indexes automatically, so the graph stays fresh without any manual intervention.
Research Background
The design, architecture, and benchmark results behind Codebase Memory MCP are described in a peer-reviewed preprint:Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP arXiv:2603.27277 — Evaluated across 31 real-world repositories: 83% answer quality, 10× fewer tokens, 2.1× fewer tool calls vs. file-by-file exploration.
Security & Privacy
Codebase Memory MCP processes your code entirely on your machine. Your source code, queries, environment, and usage never leave your machine — there is no telemetry, no analytics endpoint, and no remote service. Every release binary passes a multi-layer verification pipeline before publication:- 100% local — all graph construction and querying happens on-device.
- SLSA Level 3 — cryptographic build provenance generated by GitHub Actions; verify with
gh attestation verify <file> --repo DeusData/codebase-memory-mcp. - VirusTotal — all binaries scanned by 70+ antivirus engines (zero detections required to publish) on every release.
- Sigstore cosign — keyless signatures on all artifacts, with bundles included in every release.
- SHA-256 checksums —
checksums.txtpublished with every release and verified by both install scripts before extraction. - CodeQL SAST — blocks the release pipeline if any open alerts remain.
- Zero runtime dependencies — no transitive supply chain; all libraries are vendored at compile time.