How Garmr validates vulnerabilities with proof-of-concept exploits in isolated containers
Garmr is Heimdall’s sandbox validation engine, named after the watchdog of Hel in Norse mythology. It takes findings from the Hunt stage and proves exploitability by generating and executing proof-of-concept (PoC) scripts in isolated Docker containers.
Garmr transforms potential vulnerabilities into confirmed exploits, dramatically reducing false positives and giving you confidence in remediation priorities.
Static analysis and AI agents can identify suspicious code patterns, but they can’t always prove a vulnerability is exploitable in the real deployment context. Garmr bridges this gap:
Proof of Concept
Generate exploit scripts that demonstrate the vulnerability
Safe Execution
Run exploits in isolated containers with network disabled
Automated Analysis
AI interprets execution results to confirm exploitability
Risk Confidence
Upgrade finding confidence from “medium” to “validated”
let request = CompletionRequest { model: self.default_model.clone(), messages: vec![ Message { role: "system".to_string(), content: "You are a security researcher generating proof-of-concept exploit scripts. \ Generate minimal, safe PoC scripts that demonstrate vulnerabilities. \ The script will run in a sandboxed Docker container with the repo mounted read-only at /repo. \ Output ONLY a JSON object: {\"language\": \"python|bash|node\", \"script\": \"...\"}." .to_string(), }, Message { role: "user".to_string(), content: format!( "Generate a PoC script for this vulnerability:\n\ Title: {}\n\ Severity: {}\n\ File: {}:{}\n\ Description: {}\n\ Code snippet:\n{}\n\ \n\ The repo is mounted at /repo. Generate a script that demonstrates \ the vulnerability exists. Use exit code 0 for success (vulnerability confirmed) \ and non-zero for failure.", finding.title, finding.severity, finding.file_path, finding.line_start, finding.description.as_deref().unwrap_or("N/A"), finding.code_snippet.as_deref().unwrap_or("N/A"), ), }, ], temperature: Some(0.2), // Low temperature for reliability // ...};
Example Generated PoC:
# PoC for SQL injection in user searchimport sys# Read the vulnerable source filewith open('/repo/src/api/search.rs', 'r') as f: source = f.read()# Check if the query construction is vulnerableif 'format!("SELECT' in source and 'WHERE username LIKE' in source: # Pattern indicates string interpolation in SQL if '$1' not in source and 'bind(' not in source: # No parameterization detected print("CONFIRMED: SQL injection vulnerability detected") print("Evidence: Query uses format! without parameterization") sys.exit(0) # Success = vulnerability confirmed else: print("False positive: Query is parameterized") sys.exit(1)else: print("Could not locate vulnerable pattern") sys.exit(1)
PoC scripts are static analysis proofs, not live exploits. They read source code to verify vulnerability patterns exist without requiring a running application.
Exit Code: 0Stdout:CONFIRMED: SQL injection vulnerability detectedEvidence: Query uses format! without parameterizationFound in line 67: let query = format!("SELECT * FROM users WHERE username LIKE '%{}%'", input);Stderr:(empty)
let request = CompletionRequest { messages: vec![ Message { role: "system".to_string(), content: "You are analyzing proof-of-concept exploit execution results. \ Determine if the PoC successfully demonstrated the vulnerability. \ Output ONLY a JSON object: {\"verdict\": \"confirmed|unconfirmed|inconclusive\", \"analysis\": \"string\"}." .to_string(), }, Message { role: "user".to_string(), content: format!( "Vulnerability: {}\nSeverity: {}\n\n\ PoC Script ({}):\n{}\n\n\ Execution Results:\n\ Exit code: {}\n\ Stdout:\n{}\n\ Stderr:\n{}\n\n\ Did the PoC successfully demonstrate the vulnerability?", finding.title, finding.severity, poc.language, poc.script, exec.exit_code, exec.stdout, exec.stderr, ), }, ], // ...};
Example Analysis Response:
{ "verdict": "confirmed", "analysis": "The PoC successfully identified SQL injection vulnerability. Evidence: (1) Exit code 0 indicates positive detection, (2) Output shows vulnerable code pattern at line 67 using format! macro without parameterization, (3) Exploit confirmed the absence of bind() calls for user input sanitization. This is a valid SQL injection vulnerability with high confidence."}
Other Verdict Types
unconfirmed:
{ "verdict": "unconfirmed", "analysis": "The PoC reported that path canonicalization is present at line 134, which mitigates path traversal attacks. Exit code 1 indicates the vulnerability does not exist in the current code. This is likely a false positive from static analysis."}
inconclusive:
{ "verdict": "inconclusive", "analysis": "The PoC timed out after 30 seconds without producing output. This could indicate: (1) the script entered an infinite loop, (2) the vulnerability requires live application state, or (3) the script has bugs. Manual review recommended."}
Garmr implements defense-in-depth to prevent malicious PoC scripts:
Network Isolation
Containers have network_mode: none, blocking:
DNS lookups
HTTP/HTTPS requests
Port scanning
Data exfiltration
Filesystem Protection
Repository mounted read-only (:ro flag)
Container filesystem is ephemeral (deleted after execution)
No access to host filesystem
Resource Constraints
CPU: Limited to 1 core via nano_cpus
Memory: Hard limit of 512 MB
Timeout: Forceful termination after 30 seconds
Privilege Dropping
Runs as nobody user (UID 65534)
No --privileged flag
No volume mounts except read-only repo
Container Cleanup
src/pipeline/garmr/mod.rs
let _ = docker .remove_container( &container_name, Some(RemoveContainerOptions { force: true, // Kill if still running ..Default::default() }), ) .await;
Containers are always removed, even if execution fails.
While Garmr is designed to be safe, you should never run Heimdall on untrusted repositories or in production environments. Always use dedicated scan infrastructure.
SELECT id, title, severity, file_path, poc_validated, poc_exploit_json->>'verdict' as verdictFROM findingsWHERE scan_id = 'YOUR_SCAN_ID' AND poc_validated = trueORDER BY CASE severity WHEN 'critical' THEN 0 WHEN 'high' THEN 1 WHEN 'medium' THEN 2 WHEN 'low' THEN 3 END;