Custom Static Analysis Rules

Heimdall’s static analysis stage uses regex-based rules to detect common security vulnerabilities before the AI agent runs. You can extend this with custom rules tailored to your codebase.

How Static Analysis Rules Work

Static analysis in Heimdall (src/pipeline/static_analysis/mod.rs) runs deterministic pattern matching across your codebase:

Pattern matching: Each rule is a compiled regex that scans source files line-by-line
Language filtering: Rules can target specific languages (e.g., Python-only, JavaScript-only)
Finding creation: Matches create findings with severity, CWE classification, and code snippets
Deduplication: Each finding gets a unique fingerprint (SHA-256 hash of rule name + file path + line number)

Rule Execution Flow

Rule Format and Structure

Rules are defined as a const array in src/pipeline/static_analysis/mod.rs:

struct Rule {
    name: &'static str,           // Unique identifier
    pattern: &'static str,        // Regex pattern
    severity: &'static str,       // critical | high | medium | low
    cwe: &'static str,            // CWE-XXX identifier
    description: &'static str,    // Human-readable explanation
    languages: &'static [&'static str], // Empty = all languages
}

Field Descriptions

Field	Type	Required	Description
`name`	`&str`	Yes	Unique rule identifier (kebab-case)
`pattern`	`&str`	Yes	Rust regex pattern (supports full regex syntax)
`severity`	`&str`	Yes	One of: `critical`, `high`, `medium`, `low`
`cwe`	`&str`	Yes	CWE identifier (e.g., `CWE-89` for SQL injection)
`description`	`&str`	Yes	Short explanation shown to users
`languages`	`&[&str]`	Yes	Language filter; empty array = all files

Adding Custom Rules

Step 1: Define Your Rule

Add a new entry to the RULES array in src/pipeline/static_analysis/mod.rs:

const RULES: &[Rule] = &[
    // ... existing rules ...
    
    // Your custom rule:
    Rule {
        name: "custom-unsafe-eval",
        pattern: r#"(?i)eval\s*\("#,
        severity: "high",
        cwe: "CWE-95",
        description: "Use of eval() allows arbitrary code execution",
        languages: &["javascript", "typescript", "python"],
    },
];

Step 2: Test the Pattern

Verify your regex matches the code you want to detect:

#[cfg(test)]
mod tests {
    use super::*;
    use regex::Regex;

    #[test]
    fn test_custom_unsafe_eval() {
        let rule = RULES.iter()
            .find(|r| r.name == "custom-unsafe-eval")
            .unwrap();
        let re = Regex::new(rule.pattern).unwrap();
        
        // Positive case: should match
        assert!(re.is_match("result = eval(user_input)"));
        
        // Negative case: should not match
        assert!(!re.is_match("result = evaluate_expression()"));
    }
}

Run your test:

cargo test --lib pipeline::static_analysis::test_custom_unsafe_eval

Step 3: Rebuild and Run

cargo build --release
cargo run --bin heimdall

Your rule now runs on every scan.

Example Rules

SQL Injection Detection

Rule {
    name: "sql-injection-string-concat",
    pattern: r#"(?i)(?:execute|query|raw)\s*\(.*(?:format!|%s|\+\s*\w+|\$\{)"#,
    severity: "high",
    cwe: "CWE-89",
    description: "Potential SQL injection via string concatenation/interpolation",
    languages: &["rust", "python", "javascript", "typescript", "go", "java"],
}

Matches:

query(f"SELECT * FROM users WHERE id = {user_id}")
cursor.execute("SELECT * FROM orders WHERE status = " + status)

Does not match:

query("SELECT * FROM users WHERE id = ?", [user_id])  # Parameterized

Hardcoded Secrets

Rule {
    name: "hardcoded-api-key",
    pattern: r#"(?i)(?:api_?key|secret_?key|password|token)\s*[:=]\s*["'][A-Za-z0-9+/=_-]{16,}["']"#,
    severity: "high",
    cwe: "CWE-798",
    description: "Potential hardcoded secret or API key",
    languages: &[],  // All languages
}

Matches:

const API_KEY = "sk-ant-api03-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
let password = "MySecretPassword123";

Command Injection

Rule {
    name: "command-injection",
    pattern: r#"(?i)(?:system|exec|popen|subprocess\.(?:call|run|Popen)|child_process\.exec)\s*\(.*(?:format!|\+\s*\w+|\$\{|%s)"#,
    severity: "critical",
    cwe: "CWE-78",
    description: "Potential command injection via string interpolation in shell command",
    languages: &["rust", "python", "javascript", "typescript", "go", "java"],
}

Matches:

child_process.exec(`ls ${userInput}`);
subprocess.call("rm " + filename, shell=True);

XSS via innerHTML

Rule {
    name: "xss-innerhtml",
    pattern: r#"\.innerHTML\s*=",
    severity: "medium",
    cwe: "CWE-79",
    description: "Potential XSS via innerHTML assignment",
    languages: &["javascript", "typescript"],
}

Matches:

element.innerHTML = userInput;
document.body.innerHTML = htmlContent;

Advanced Patterns

Multiline Matching

Rust regex doesn’t match across newlines by default. Use (?m) for multiline mode:

Rule {
    name: "toctou-race",
    pattern: r#"(?i)(?:os\.path\.exists|File\.exists|access)\s*\([^)]+\).*\n.*(?:open|read|write|unlink|remove)\s*\("#,
    severity: "medium",
    cwe: "CWE-367",
    description: "Time-of-check to time-of-use (TOCTOU) race condition on file operation",
    languages: &["python", "java", "c", "cpp"],
}

Matches:

if os.path.exists(filename):
    with open(filename) as f:  # Race window here
        data = f.read()

Case-Insensitive Matching

Use (?i) flag:

pattern: r#"(?i)eval\s*\("#  // Matches eval, Eval, EVAL

Negative Lookahead

Exclude false positives:

// Match eval() but NOT safe_eval()
pattern: r#"(?<!safe_)eval\s*\("#

Testing Rules

Unit Test Template

#[test]
fn test_my_custom_rule() {
    let rule = RULES.iter()
        .find(|r| r.name == "my-custom-rule")
        .unwrap();
    let re = Regex::new(rule.pattern).unwrap();
    
    // True positives
    assert!(re.is_match("dangerous_code_here"));
    
    // False positives (should NOT match)
    assert!(!re.is_match("safe_code_here"));
}

Test All Rules Compile

Heimdall includes a test to ensure all patterns are valid regex:

#[test]
fn test_all_rule_patterns_compile() {
    for rule in RULES {
        let result = Regex::new(rule.pattern);
        assert!(
            result.is_ok(),
            "Rule '{}' has invalid regex pattern: {}",
            rule.name,
            rule.pattern,
        );
    }
}

Run all static analysis tests:

cargo test --lib pipeline::static_analysis

Rule Performance

Catastrophic Backtracking

Avoid regex patterns that can cause exponential time complexity (ReDoS): Bad:

pattern: r#"(.*)*"#  // Catastrophic backtracking

Good:

pattern: r#".*"#     // Linear time

Benchmarking

For large codebases, test rule performance:

#[test]
fn benchmark_rule_performance() {
    let rule = RULES.iter()
        .find(|r| r.name == "sql-injection-string-concat")
        .unwrap();
    let re = Regex::new(rule.pattern).unwrap();
    let large_file = "...".repeat(10_000);
    
    let start = std::time::Instant::now();
    for line in large_file.lines() {
        re.is_match(line);
    }
    let elapsed = start.elapsed();
    
    assert!(elapsed.as_millis() < 100, "Rule too slow: {:?}", elapsed);
}

Severity Guidelines

Severity	Use When	Examples
Critical	Remote code execution, authentication bypass	Command injection, hardcoded admin credentials
High	Data exposure, SQL injection	Unparameterized queries, secrets in code
Medium	XSS, CSRF, weak crypto	innerHTML assignment, MD5 usage
Low	Information disclosure, debug mode	Stack traces in responses, verbose errors

CWE Reference

Common CWEs for rule classification:

CWE-78: Command Injection
CWE-79: Cross-site Scripting (XSS)
CWE-89: SQL Injection
CWE-798: Hardcoded Credentials
CWE-327: Weak Crypto Algorithm
CWE-502: Deserialization of Untrusted Data
CWE-918: Server-Side Request Forgery (SSRF)

Full list: CWE Top 25

Integration with Semgrep

For more complex rules, Heimdall also integrates with Semgrep:

# Install semgrep
pip install semgrep

# Heimdall automatically runs it if available
cargo run --bin heimdall

Semgrep findings are deduplicated against built-in rules and stored with source = "static".

Rule Deduplication

Each finding gets a unique fingerprint:

fn make_fingerprint(rule: &str, file: &str, line: i32) -> String {
    let input = format!("{rule}:{file}:{line}");
    let mut hasher = Sha256::new();
    hasher.update(input.as_bytes());
    hex::encode(hasher.finalize())
}

If the same rule matches the same file/line in a later scan, the fingerprint matches and no duplicate finding is created.

Best Practices

Start with high severity: Only add rules for real vulnerabilities, not code style issues
Test on real code: Run against your actual codebase to tune false positive rate

Document exceptions: Use comments to explain why a pattern is safe:

# nosec: eval() is safe here because input is validated
result = eval(expression)

Use language filters: Avoid scanning irrelevant files (e.g., Python rules on JavaScript)
Keep patterns simple: Complex regex is hard to maintain and can be slow

src/pipeline/static_analysis/mod.rs — Core static analysis engine and rule definitions
src/pipeline/static_analysis/semgrep.rs — Semgrep integration
src/pipeline/hunt/mod.rs — AI-powered Hunt agent (runs after static analysis)

Overview

Getting Started

Core Features

Deployment

Integrations

Advanced

Custom Static Analysis Rules

How Static Analysis Rules Work

Rule Execution Flow

Rule Format and Structure

Field Descriptions

Adding Custom Rules

Step 1: Define Your Rule

Step 2: Test the Pattern

Step 3: Rebuild and Run

Example Rules

SQL Injection Detection

Hardcoded Secrets

Command Injection

XSS via innerHTML

Advanced Patterns

Multiline Matching

Case-Insensitive Matching

Negative Lookahead

Testing Rules

Unit Test Template

Test All Rules Compile

Rule Performance

Catastrophic Backtracking

Benchmarking

Severity Guidelines

CWE Reference

Integration with Semgrep

Rule Deduplication

Best Practices

Build docs developers (and LLMs) love

Overview

Getting Started

Core Features

Deployment

Integrations

Advanced

​How Static Analysis Rules Work

​Rule Execution Flow

​Rule Format and Structure

​Field Descriptions

​Adding Custom Rules

​Step 1: Define Your Rule

​Step 2: Test the Pattern

​Step 3: Rebuild and Run

​Example Rules

​SQL Injection Detection

​Hardcoded Secrets

​Command Injection

​XSS via innerHTML

​Advanced Patterns

​Multiline Matching

​Case-Insensitive Matching

​Negative Lookahead

​Testing Rules

​Unit Test Template

​Test All Rules Compile

​Rule Performance

​Catastrophic Backtracking

​Benchmarking

​Severity Guidelines

​CWE Reference

​Integration with Semgrep

​Rule Deduplication

​Best Practices

​Related Files

Build docs developers (and LLMs) love

How Static Analysis Rules Work

Rule Execution Flow

Rule Format and Structure

Field Descriptions

Adding Custom Rules

Step 1: Define Your Rule

Step 2: Test the Pattern

Step 3: Rebuild and Run

Example Rules

SQL Injection Detection

Hardcoded Secrets

Command Injection

XSS via innerHTML

Advanced Patterns

Multiline Matching

Case-Insensitive Matching

Negative Lookahead

Testing Rules

Unit Test Template

Test All Rules Compile

Rule Performance

Catastrophic Backtracking

Benchmarking

Severity Guidelines

CWE Reference

Integration with Semgrep

Rule Deduplication

Best Practices

Related Files