Overview
Every scan follows a deterministic 7-stage pipeline that progressively narrows from broad reconnaissance to precise vulnerability confirmation:Ingest: Code Acquisition & Indexing
Heimdall clones your repository and builds a comprehensive code index using tree-sitter for AST parsing.The
What happens:
- Source acquisition: Clones from GitHub, GitLab, public git URLs, or extracts uploaded zip archives
- Commit detection: Resolves the current commit SHA for traceability
- File enumeration: Walks the repository, filtering out
node_modules,.git, build artifacts, etc. - AST parsing: Extracts symbols, functions, classes, and call graphs using tree-sitter
- Database snapshots: Stores file content hashes and metadata for each scanned file
Supported languages:
- Full symbol extraction: Rust, Python, JavaScript, TypeScript, Go, Java
- Basic coverage: Ruby, PHP (regex-based fallback)
Example output:
CodeIndex provides:- Symbol lookup by name, type, or file
- Call graph traversal (“who calls this function?”)
- Dependency resolution (imports/requires)
- Full-text regex search across all files
Files larger than 1MB are skipped to avoid performance issues during parsing.
Tyr: Threat Model Generation
Named after the Norse god of justice, Tyr builds a structured threat model using STRIDE methodology.The threat model is persisted to the database and drives the Hunt stage.
Phase 1: Reconnaissance
Tyr performs static reconnaissance without any LLM calls:- Tech stack detection: Identifies frameworks from file extensions and imports (Actix-web, Flask, Express, etc.)
- Entry point discovery: Finds
main()functions, route handlers, public exports - Route mapping: Extracts API endpoints from decorators, macros, and framework patterns
- Security pattern detection: Locates authentication, session management, encryption, SQL queries, file uploads
- Database access: Identifies ORM usage and raw SQL execution
- Configuration analysis: Finds environment variable usage and config files
Phase 2: LLM Analysis
Tyr sends reconnaissance data to the AI model with a prompt based on STRIDE:- Spoofing — can an attacker impersonate another user?
- Tampering — can data be modified without authorization?
- Repudiation — are actions auditable?
- Information Disclosure — can sensitive data leak?
- Denial of Service — can the system be made unavailable?
- Elevation of Privilege — can an attacker gain higher permissions?
Output structure:
View Tyr System Prompt
View Tyr System Prompt
Static Analysis: Pattern Matching & Secrets
Fast, deterministic checks using regex patterns and tree-sitter queries.Static analysis is fast (completes in seconds) and produces high-confidence findings for known patterns.
Detection categories:
- Injection flaws: SQL injection, command injection, XSS, LDAP injection, template injection
- Hardcoded secrets: API keys, passwords, tokens, private keys, AWS credentials
- Crypto issues: Weak algorithms (MD5, SHA1), hardcoded keys, insecure random
- Path traversal:
../patterns in file operations - Unsafe deserialization:
pickle.loads(),eval(),unserialize() - CSRF: Missing CSRF tokens on state-changing endpoints
- Open redirects: Unvalidated redirect destinations
- Dependency vulnerabilities: Known CVEs from OSV database
Example rule (SQL injection):
Hunt: Agentic Vulnerability Discovery
The Hunt stage deploys autonomous AI agents that reason about code like a security researcher.
How it works:
For each attack surface from Tyr’s threat model, Hunt spawns a parallel investigation:Agent state machine:
Available tools:
| Tool | Purpose | Example |
|---|---|---|
read_file | Read file contents (15KB truncation) | {"path": "src/auth.rs"} |
search_code | Regex search across codebase (30 results max) | {"pattern": "password.*hash"} |
get_callers | Find all call sites of a symbol | {"symbol": "execute_query"} |
get_dependencies | Get dependency graph for a file | {"file": "routes/api.rs"} |
report_finding | Report a discovered vulnerability | {"title": "SQL Injection in /api/users", "severity": "critical", ...} |
Example investigation:
Iteration limit:
Each agent has a maximum of 25 iterations to prevent infinite loops. Most investigations complete in 5-10 iterations.Víðarr: Adversarial Verification
Named after the silent Norse god of vengeance, Víðarr acts as a skeptical judge that tries to disprove every finding.Víðarr typically confirms 60-70% of findings, marks 20-30% as plausible, and dismisses 5-10% as false positives.
Why adversarial verification?
AI-discovered findings can include false positives. Víðarr filters them by:- Looking for input validation that prevents exploitation
- Checking for authentication guards that restrict access
- Identifying framework protections (ORM parameterization, auto-escaping)
- Verifying code reachability from user input
- Catching context errors (misreading of code)
For each finding:
Example verdict:
Garmr: Sandbox Validation
Named after the hound guarding the gates of Hel, Garmr executes proof-of-concept exploits in isolated Docker containers.
Sandbox constraints:
- No network access (network mode: none)
- 1 CPU, 512MB RAM
- 30-second timeout
- Repository mounted read-only at
/repo - Runs as
nobodyuser (non-root)
Workflow:
Example PoC script:
Graceful degradation:
If Docker is not available, Garmr is skipped automatically and findings are still reported (without sandbox confirmation).PoC scripts are designed to demonstrate vulnerabilities, not cause damage. They run in isolated containers with no network access.
Report: Ranking & Patch Generation
The final stage enriches findings with suggested patches as unified diffs.
Patch generation:
For each finding, the LLM is asked to generate a minimal, correct diff:Patch validation:
Heimdall validates generated patches to prevent hallucinations:- Targets correct file: Patch
---/+++headers match the finding’s file path - Hunks match source: Every context/removed line exists verbatim in the scanned file
- Touches finding lines: Patch overlaps the reported vulnerability location
Finding counts:
The scan metadata is updated with final counts:Real-Time Progress via SSE
Clients can subscribe to Server-Sent Events (SSE) for live scan updates:Event types:
| Event | When | Payload |
|---|---|---|
status_change | Scan status updates | {"status": "ingesting"} |
stage_update | Stage starts/completes | {"stage": "hunt", "status": "running"} |
finding_added | New vulnerability discovered | {"finding_id": "...", "title": "SQL Injection", "severity": "critical"} |
scan_complete | Scan finishes | {"finding_count": 23, "critical": 3, "high": 8, ...} |
error | Stage fails | {"error": "Docker not available"} |
Example SSE stream:
How Findings Are Generated
Findings come from three sources:Static Rules
Pattern-based detection (regex, tree-sitter queries)Confidence: HighSpeed: SecondsCoverage: Known vulnerability classes
AI Agents
Agentic code reasoning (Hunt stage)Confidence: Medium (before Víðarr)Speed: MinutesCoverage: Security + logic flaws
Dependency Audit
OSV database lookups for known CVEsConfidence: HighSpeed: SecondsCoverage: Third-party libraries
Finding metadata:
Every finding includes:- Severity:
critical,high,medium,low - Confidence:
high,medium,low - Status:
open,false_positive,resolved,ignored - Source badge:
static,ai,dependencies - CWE/CVE classification (when applicable)
- File + line number with code snippet
- Plain English explanation
- Suggested patch (unified diff)
- PoC exploit (if sandbox-validated)
- Fingerprint (for deduplication across scans)
Performance Characteristics
| Stage | Typical Duration | Bottleneck |
|---|---|---|
| Ingest | 10-30s | Repository size, network speed |
| Tyr | 20-40s | LLM latency |
| Static Analysis | 5-15s | File count, regex complexity |
| Hunt | 3-10 min | Attack surface count, LLM latency |
| Víðarr | 1-3 min | Finding count, LLM latency |
| Garmr | 1-5 min | Finding count, Docker overhead |
| Report | 30s-2 min | Finding count, patch generation |
Hunt investigations run in parallel via
tokio::spawn, so 10 attack surfaces can be investigated concurrently.What Makes Heimdall Different?
Context-Aware Analysis
Context-Aware Analysis
Traditional scanners use pattern matching. Heimdall builds a threat model first, then investigates specific attack surfaces with full code context (call graphs, data flows, authentication checks).
Agentic Reasoning
Agentic Reasoning
Hunt agents don’t just match patterns — they reason about code like a human security researcher. They can follow call chains, read authentication middleware, and understand framework-level protections.
Adversarial Filtering
Adversarial Filtering
Víðarr acts as a red team reviewer that tries to disprove findings before they reach you. This dramatically reduces false positives compared to other AI-based scanners.
Sandbox Validation
Sandbox Validation
Garmr doesn’t just report theoretical vulnerabilities — it proves exploitability by running PoC scripts in isolated Docker containers.
Automatic Remediation
Automatic Remediation
Every finding includes a suggested patch as a unified diff. You can review and apply fixes with a single click.