Language Support Matrix
| Language | Grammar | Symbol Extraction | Call Graph | Status |
|---|---|---|---|---|
| Rust | tree-sitter-rust | Functions, structs, traits, enums, methods | Full | ✅ Full |
| Python | tree-sitter-python | Functions, classes, methods | Full | ✅ Full |
| JavaScript | tree-sitter-javascript | Functions, classes, arrow functions, methods | Full | ✅ Full |
| TypeScript | tree-sitter-typescript | Functions, classes, methods, interfaces | Full | ✅ Full |
| Go | tree-sitter-go | Functions, methods, structs, interfaces | Full | ✅ Full |
| Java | tree-sitter-java | Classes, methods, constructors, interfaces | Full | ✅ Full |
| Ruby | regex fallback | Methods, classes, modules | Basic | ⚠️ Basic |
| PHP | regex fallback | Functions, classes | Basic | ⚠️ Basic |
| C | regex fallback | Functions, structs, typedefs, macros | Basic | ⚠️ Basic |
| C++ | regex fallback | Classes, functions, methods, namespaces | Basic | ⚠️ Basic |
| C# | regex fallback | Classes, methods, properties, interfaces | Basic | ⚠️ Basic |
| Swift | regex fallback | Functions, classes, structs, protocols | Basic | ⚠️ Basic |
| Kotlin | regex fallback | Functions, classes, objects | Basic | ⚠️ Basic |
| Scala | regex fallback | Functions, classes, traits, objects | Basic | ⚠️ Basic |
| Shell/Bash | regex fallback | Functions, aliases, exports | Basic | ⚠️ Basic |
- ✅ Full: Tree-sitter AST parsing, complete symbol table, accurate call graphs
- ⚠️ Basic: Regex-based heuristics, best-effort symbol extraction, no call graph
Tree-Sitter Grammars
Tree-sitter provides robust, incremental parsers for supported languages. Heimdall uses the following grammars:Rust
Grammar:tree-sitter-rustVersion: Latest
Extraction:
src/index/symbols.rs:189-293
What’s extracted:
- Functions (
fn main(),pub async fn handler()) - Structs (
pub struct Config) - Traits (
pub trait Provider) - Enums (
pub enum Status) - Impl blocks and methods
- Visibility modifiers (
pub, private) - Entry points (
main(), route handlers inroutes/files)
User(struct, public)new(method, public)
Python
Grammar:tree-sitter-pythonVersion: Latest
Extraction:
src/index/symbols.rs:299-350
What’s extracted:
- Functions (
def hello():) - Classes (
class MyClass:) - Methods (functions inside class bodies)
- Async functions (
async def handler():) - Public/private based on naming (
_private_method)
UserService(class, public)create_user(method, public)_validate_email(method, private)
JavaScript / TypeScript
Grammar:tree-sitter-javascript, tree-sitter-typescriptExtraction:
src/index/symbols.rs:356-460
What’s extracted:
- Function declarations (
function foo() {}) - Class declarations (
class Bar {}) - Arrow functions (
const handler = () => {}) - Methods (
method() {}) - Exported symbols (
export function ...,export class ...)
AuthService(class, exported)login(method, public)#generateToken(method, private)validateToken(function, exported)
Go
Grammar:tree-sitter-goExtraction:
src/index/symbols.rs:466-542
What’s extracted:
- Functions (
func Process()) - Methods (
func (s *Service) Handle()) - Structs (
type Config struct) - Interfaces (
type Reader interface) - Public/private based on capitalization (
Publicvsprivate)
UserRepository(struct, public)FindByID(method, public, entry point)validate(method, private)
Java
Grammar:tree-sitter-javaExtraction:
src/index/symbols.rs:548-626
What’s extracted:
- Classes (
public class User) - Interfaces (
public interface Service) - Methods (
public void save()) - Constructors (
public User()) - Public/private/protected modifiers
UserController(class, public)createUser(method, public, entry point)validate(method, private)
What’s Extracted for Each Language
Symbol Types
Each extracted symbol includes:Entry Point Detection
Heimdall marks symbols as entry points using heuristics:| Language | Entry Point Criteria |
|---|---|
| Rust | fn main(), public functions in routes/ files, functions starting with handle_ |
| Python | def main(), functions in views/ or routes/ files |
| JavaScript/TypeScript | Functions in files containing route, handler, or api in path |
| Go | func main(), public functions in handler/ or api/ files |
| Java | public static void main(), public methods in *Controller or *Handler files |
Call Graph Construction
For tree-sitter-supported languages, Heimdall extracts call relationships:- Find all call expressions in the AST (
call_expression,method_invocation, etc.) - Extract the callee identifier
- Match against known function/method symbols
- Store in
Symbol.callsvector
process_user symbol will have calls = ["validate_id", "store_user"].
Regex Fallback Languages
For languages without tree-sitter support, Heimdall uses regex-based extraction.Ruby
Extraction:src/index/symbols.rs:869-915
Patterns:
UserService(class)create(method)valid_email?(method)
C/C++
Extraction:src/index/symbols.rs:952-1064
Patterns:
- Function pointer types may cause false positives
- Template specializations not fully supported
- Preprocessor macros parsed separately
C#
Extraction:src/index/symbols.rs:1066-1127
Patterns:
Adding New Language Support
Option 1: Tree-Sitter Grammar
For full AST support: Step 1: Add the tree-sitter dependency toCargo.toml:
src/index/symbols.rs:
extract_with_tree_sitter:
Option 2: Regex Fallback
For simpler support: Step 1: Define regex patterns insrc/index/symbols.rs:
extract_symbols_regex:
Language Detection
Heimdall infers language from file extensions:Static Analysis Rule Coverage
Static analysis rules insrc/pipeline/static_analysis/mod.rs use language filters:
languages filters to include it.
Performance Considerations
- Tree-sitter parsing: ~10-50ms per file (depends on file size)
- Regex fallback: ~1-5ms per file
- Memory: Symbol index is held in memory during scans (~1-5MB for typical repos)
- Indexing only changed files in incremental scans
- Sampling strategy (index entry points + changed files)
- Parallel indexing (Heimdall uses
rayonfor this)
Testing
Run language extraction tests:Related Files
src/index/symbols.rs— Symbol extraction for all languagessrc/index/callgraph.rs— Call graph constructionsrc/pipeline/static_analysis/mod.rs— Static analysis rules with language filters