Skip to main content
Heimdall is an agentic, context-aware security scanner that goes beyond traditional pattern matching. It builds a threat model of your application, deploys AI agents that reason about your codebase, validates findings in sandboxed environments, and produces ranked, actionable results with patches and proof-of-concept exploits.

Overview

Every scan follows a deterministic 7-stage pipeline that progressively narrows from broad reconnaissance to precise vulnerability confirmation:
1

Ingest: Code Acquisition & Indexing

Heimdall clones your repository and builds a comprehensive code index using tree-sitter for AST parsing.

What happens:

  • Source acquisition: Clones from GitHub, GitLab, public git URLs, or extracts uploaded zip archives
  • Commit detection: Resolves the current commit SHA for traceability
  • File enumeration: Walks the repository, filtering out node_modules, .git, build artifacts, etc.
  • AST parsing: Extracts symbols, functions, classes, and call graphs using tree-sitter
  • Database snapshots: Stores file content hashes and metadata for each scanned file

Supported languages:

  • Full symbol extraction: Rust, Python, JavaScript, TypeScript, Go, Java
  • Basic coverage: Ruby, PHP (regex-based fallback)

Example output:

[scan_id] Indexed 342 files, 1,847 symbols
The CodeIndex provides:
  • Symbol lookup by name, type, or file
  • Call graph traversal (“who calls this function?”)
  • Dependency resolution (imports/requires)
  • Full-text regex search across all files
Files larger than 1MB are skipped to avoid performance issues during parsing.
2

Tyr: Threat Model Generation

Named after the Norse god of justice, Tyr builds a structured threat model using STRIDE methodology.

Phase 1: Reconnaissance

Tyr performs static reconnaissance without any LLM calls:
  • Tech stack detection: Identifies frameworks from file extensions and imports (Actix-web, Flask, Express, etc.)
  • Entry point discovery: Finds main() functions, route handlers, public exports
  • Route mapping: Extracts API endpoints from decorators, macros, and framework patterns
  • Security pattern detection: Locates authentication, session management, encryption, SQL queries, file uploads
  • Database access: Identifies ORM usage and raw SQL execution
  • Configuration analysis: Finds environment variable usage and config files

Phase 2: LLM Analysis

Tyr sends reconnaissance data to the AI model with a prompt based on STRIDE:
  • Spoofing — can an attacker impersonate another user?
  • Tampering — can data be modified without authorization?
  • Repudiation — are actions auditable?
  • Information Disclosure — can sensitive data leak?
  • Denial of Service — can the system be made unavailable?
  • Elevation of Privilege — can an attacker gain higher permissions?

Output structure:

{
  "summary": "2-3 paragraph overview of the application architecture and security posture",
  "boundaries": [
    {
      "name": "User Browser → API Server",
      "description": "Untrusted HTTP requests cross into the application layer",
      "from_zone": "Public Internet",
      "to_zone": "Application Server",
    }
  ],
  "surfaces": [
    {
      "name": "File Upload Endpoint",
      "description": "POST /api/upload accepts user files without validation, vulnerable to path traversal and malicious content",
      "endpoint": "/api/upload",
      "file": "src/routes/upload.rs",
      "risk_level": "high"
    }
  ],
  "data_flows": [
    {
      "name": "Password Reset Flow",
      "source": "User email input",
      "sink": "SMTP server",
      "sensitive_data": "Email addresses, reset tokens"
    }
  ]
}
The threat model is persisted to the database and drives the Hunt stage.
You are Tyr, the threat model engine of Heimdall security scanner.
Named after the Norse god of justice and law, you produce rigorous, structured threat models.

Your analysis methodology follows STRIDE...
- Reference actual files and endpoints from the codebase
- Be concrete and specific
- Every surface must have a risk_level reflecting real exploitability
- Aim for completeness — missing a real attack surface is worse than including a low-risk one
3

Static Analysis: Pattern Matching & Secrets

Fast, deterministic checks using regex patterns and tree-sitter queries.

Detection categories:

  • Injection flaws: SQL injection, command injection, XSS, LDAP injection, template injection
  • Hardcoded secrets: API keys, passwords, tokens, private keys, AWS credentials
  • Crypto issues: Weak algorithms (MD5, SHA1), hardcoded keys, insecure random
  • Path traversal: ../ patterns in file operations
  • Unsafe deserialization: pickle.loads(), eval(), unserialize()
  • CSRF: Missing CSRF tokens on state-changing endpoints
  • Open redirects: Unvalidated redirect destinations
  • Dependency vulnerabilities: Known CVEs from OSV database

Example rule (SQL injection):

let pattern = r#"execute\(.*\+.*\)"#; // String concatenation in SQL
let matches = search_code_index(pattern);
for match in matches {
    create_finding(
        "SQL Injection via String Concatenation",
        "high",
        match.file,
        match.line,
        "Use parameterized queries instead of string concatenation"
    );
}
Static analysis is fast (completes in seconds) and produces high-confidence findings for known patterns.
4

Hunt: Agentic Vulnerability Discovery

The Hunt stage deploys autonomous AI agents that reason about code like a security researcher.

How it works:

For each attack surface from Tyr’s threat model, Hunt spawns a parallel investigation:
for surface in threat_model.surfaces {
    tokio::spawn(async move {
        let agent = HuntAgent::new(scan_id, db, ai, model);
        agent.investigate(&surface, &code_index, &static_context).await
    });
}

Agent state machine:

Available tools:

ToolPurposeExample
read_fileRead file contents (15KB truncation){"path": "src/auth.rs"}
search_codeRegex search across codebase (30 results max){"pattern": "password.*hash"}
get_callersFind all call sites of a symbol{"symbol": "execute_query"}
get_dependenciesGet dependency graph for a file{"file": "routes/api.rs"}
report_findingReport a discovered vulnerability{"title": "SQL Injection in /api/users", "severity": "critical", ...}

Example investigation:

{
  "name": "File Upload Endpoint",
  "endpoint": "/api/repos/upload",
  "file": "src/routes/repos.rs",
  "risk_level": "high",
  "description": "Processes zip file uploads from users"
}

Iteration limit:

Each agent has a maximum of 25 iterations to prevent infinite loops. Most investigations complete in 5-10 iterations.
Hunt agents investigate security vulnerabilities and logic flaws — not code style or performance issues.
5

Víðarr: Adversarial Verification

Named after the silent Norse god of vengeance, Víðarr acts as a skeptical judge that tries to disprove every finding.

Why adversarial verification?

AI-discovered findings can include false positives. Víðarr filters them by:
  1. Looking for input validation that prevents exploitation
  2. Checking for authentication guards that restrict access
  3. Identifying framework protections (ORM parameterization, auto-escaping)
  4. Verifying code reachability from user input
  5. Catching context errors (misreading of code)

For each finding:

let verdict = challenge_finding(finding, code_index).await;
match verdict.outcome {
    Confirmed => {
        update_confidence(finding.id, "high");
        if verdict.adjusted_severity != finding.severity {
            update_severity(finding.id, verdict.adjusted_severity);
        }
    }
    Plausible => { /* Keep as-is */ }
    FalsePositive => {
        update_status(finding.id, "false_positive");
        update_confidence(finding.id, "low");
    }
}

Example verdict:

{
  "verdict": "false_positive",
  "reasoning": "The SQL query uses sqlx::query! macro which provides compile-time verification and automatic parameterization. The user input is bound as a parameter, not concatenated into the query string. Framework protection prevents SQL injection.",
  "adjusted_severity": null
}
Víðarr typically confirms 60-70% of findings, marks 20-30% as plausible, and dismisses 5-10% as false positives.
6

Garmr: Sandbox Validation

Named after the hound guarding the gates of Hel, Garmr executes proof-of-concept exploits in isolated Docker containers.

Sandbox constraints:

  • No network access (network mode: none)
  • 1 CPU, 512MB RAM
  • 30-second timeout
  • Repository mounted read-only at /repo
  • Runs as nobody user (non-root)

Workflow:

Example PoC script:

# Generated for: SQL Injection in /api/users/search
import sqlite3

# Test if query is vulnerable to UNION injection
conn = sqlite3.connect('/repo/data.db')
cursor = conn.cursor()

payload = "' UNION SELECT password FROM users--"
try:
    cursor.execute(f"SELECT * FROM users WHERE name = '{payload}'")
    results = cursor.fetchall()
    if results:
        print("[VULN] SQL injection successful")
        exit(0)  # Vulnerability confirmed
except:
    exit(1)  # Not exploitable

Graceful degradation:

If Docker is not available, Garmr is skipped automatically and findings are still reported (without sandbox confirmation).
PoC scripts are designed to demonstrate vulnerabilities, not cause damage. They run in isolated containers with no network access.
7

Report: Ranking & Patch Generation

The final stage enriches findings with suggested patches as unified diffs.

Patch generation:

For each finding, the LLM is asked to generate a minimal, correct diff:
--- a/src/routes/auth.rs
+++ b/src/routes/auth.rs
@@ -42,7 +42,7 @@
 pub async fn login(req: LoginRequest) -> Result<LoginResponse> {
-    let query = format!("SELECT * FROM users WHERE email = '{}'", req.email);
-    let user = sqlx::query(&query).fetch_one(&pool).await?;
+    let user = sqlx::query!("SELECT * FROM users WHERE email = $1", req.email)
+        .fetch_one(&pool).await?;
     Ok(LoginResponse { token: generate_token(&user) })
 }

Patch validation:

Heimdall validates generated patches to prevent hallucinations:
  1. Targets correct file: Patch ---/+++ headers match the finding’s file path
  2. Hunks match source: Every context/removed line exists verbatim in the scanned file
  3. Touches finding lines: Patch overlaps the reported vulnerability location
If validation fails, the patch is discarded and logged.

Finding counts:

The scan metadata is updated with final counts:
{
  "finding_count": 23,
  "critical_count": 3,
  "high_count": 8,
  "medium_count": 9,
  "low_count": 3
}

Real-Time Progress via SSE

Clients can subscribe to Server-Sent Events (SSE) for live scan updates:
curl -N http://localhost:8080/api/scans/{scan_id}/progress/stream

Event types:

EventWhenPayload
status_changeScan status updates{"status": "ingesting"}
stage_updateStage starts/completes{"stage": "hunt", "status": "running"}
finding_addedNew vulnerability discovered{"finding_id": "...", "title": "SQL Injection", "severity": "critical"}
scan_completeScan finishes{"finding_count": 23, "critical": 3, "high": 8, ...}
errorStage fails{"error": "Docker not available"}

Example SSE stream:

event: status_change
data: {"scan_id": "...", "status": "ingesting"}

event: stage_update
data: {"stage": "ingest", "status": "running"}

event: stage_update
data: {"stage": "ingest", "status": "completed"}

event: finding_added
data: {"finding_id": "...", "title": "Hardcoded API Key", "severity": "high"}

event: scan_complete
data: {"finding_count": 23, "critical": 3, "high": 8, "medium": 9, "low": 3}

How Findings Are Generated

Findings come from three sources:

Static Rules

Pattern-based detection (regex, tree-sitter queries)Confidence: HighSpeed: SecondsCoverage: Known vulnerability classes

AI Agents

Agentic code reasoning (Hunt stage)Confidence: Medium (before Víðarr)Speed: MinutesCoverage: Security + logic flaws

Dependency Audit

OSV database lookups for known CVEsConfidence: HighSpeed: SecondsCoverage: Third-party libraries

Finding metadata:

Every finding includes:
  • Severity: critical, high, medium, low
  • Confidence: high, medium, low
  • Status: open, false_positive, resolved, ignored
  • Source badge: static, ai, dependencies
  • CWE/CVE classification (when applicable)
  • File + line number with code snippet
  • Plain English explanation
  • Suggested patch (unified diff)
  • PoC exploit (if sandbox-validated)
  • Fingerprint (for deduplication across scans)

Performance Characteristics

StageTypical DurationBottleneck
Ingest10-30sRepository size, network speed
Tyr20-40sLLM latency
Static Analysis5-15sFile count, regex complexity
Hunt3-10 minAttack surface count, LLM latency
Víðarr1-3 minFinding count, LLM latency
Garmr1-5 minFinding count, Docker overhead
Report30s-2 minFinding count, patch generation
Total scan time: 5-20 minutes for a typical application (depending on size and attack surface complexity).
Hunt investigations run in parallel via tokio::spawn, so 10 attack surfaces can be investigated concurrently.

What Makes Heimdall Different?

Traditional scanners use pattern matching. Heimdall builds a threat model first, then investigates specific attack surfaces with full code context (call graphs, data flows, authentication checks).
Hunt agents don’t just match patterns — they reason about code like a human security researcher. They can follow call chains, read authentication middleware, and understand framework-level protections.
Víðarr acts as a red team reviewer that tries to disprove findings before they reach you. This dramatically reduces false positives compared to other AI-based scanners.
Garmr doesn’t just report theoretical vulnerabilities — it proves exploitability by running PoC scripts in isolated Docker containers.
Every finding includes a suggested patch as a unified diff. You can review and apply fixes with a single click.

Build docs developers (and LLMs) love