Heimdall is built as a monolithic Rust application with an embedded web server, background worker, and modular pipeline architecture. All components share a single PostgreSQL database and communicate via in-memory channels for real-time updates.
High-Level Architecture
Core Components
Web Server (Actix-web)
The HTTP layer is built on Actix-web 4 , a high-performance async web framework for Rust.
Key modules:
src/main.rs : Server bootstrap, database connection, schema application
src/routes/ : HTTP handlers for pages, API endpoints, webhooks
src/middleware/ : Authentication (session cookies), CSRF protection
src/templates.rs : minijinja template engine for server-rendered HTML
src/sse.rs : Server-Sent Events broadcaster for real-time scan progress
Request flow:
// src/main.rs
HttpServer :: new ( move || {
App :: new ()
. app_data ( app_state . clone ())
. wrap ( Cors :: default ())
. wrap ( CsrfProtection )
. wrap ( actix_middleware :: Logger :: default ())
. service ( actix_files :: Files :: new ( "/static" , "static" ))
. configure ( routes :: init )
})
. bind (( "0.0.0.0" , 8080 )) ?
. run ()
. await
Heimdall uses HTMX for dynamic UI updates without JavaScript frameworks. The server returns HTML fragments that HTMX swaps into the page.
Pipeline Orchestrator
The ScanPipeline struct orchestrates the 7-stage scan lifecycle with proper error handling, status tracking, and SSE event emission.
Located in: src/pipeline/mod.rs
pub struct ScanPipeline {
pub scan_id : uuid :: Uuid ,
pub db : Arc < DatabaseOperations >,
pub ai : Arc < dyn ModelProvider >,
pub default_model : String ,
pub sse : Arc < ScanBroadcaster >,
pub encryption_key : Option <[ u8 ; 32 ]>,
pub data_dir : String ,
}
impl ScanPipeline {
pub async fn run ( & self , repo : & Repo ) -> HeimdallResult <()> {
// Stage 1: Ingest
let ingest_output = self . run_stage ( "ingest" , "ingesting" , "ingested" , async {
let stage = ingest :: IngestStage :: new ( ... );
stage . run ( repo ) . await
}) . await ? ;
// Stage 2: Tyr (Threat Model)
let threat_model = self . run_stage ( "tyr" , "modeling" , "modeled" , async {
let stage = tyr :: TyrStage :: new ( ... );
stage . run ( & code_index ) . await
}) . await ? ;
// ... continues through all 7 stages
}
}
Stage execution wrapper:
Every stage runs through run_stage() which:
Checks for scan cancellation
Creates a scan_stages record in the database
Updates scan status (e.g., "running" → "ingesting")
Emits SSE events (stage_update, status_change)
Executes the stage logic
Handles errors and updates status to "failed" or "completed"
AI Backend (Model-Agnostic)
Heimdall abstracts AI providers behind the ModelProvider trait:
// src/ai/mod.rs
#[async_trait]
pub trait ModelProvider : Send + Sync {
async fn complete ( & self , request : CompletionRequest ) -> HeimdallResult < CompletionResponse >;
fn provider_name ( & self ) -> & str ;
}
Supported providers:
Anthropic Claude Native tool_use format Default: claude-sonnet-4-20250514 Recommended for security analysis
OpenAI GPT Function calling format Models: gpt-4o, o1, o3 Compatible with Azure OpenAI
Ollama (Local) Local inference, no API key Models: Llama 3.3, Mistral, DeepSeek, Qwen Runs on http://localhost:11434
Automatic fallback:
The FallbackProvider chains multiple providers together:
// src/ai/fallback.rs
pub struct FallbackProvider {
providers : Vec <( Box < dyn ModelProvider >, String )>, // (provider, model)
}
impl FallbackProvider {
pub async fn complete ( & self , mut request : CompletionRequest ) -> HeimdallResult < CompletionResponse > {
let mut last_error = None ;
for ( provider , model ) in & self . providers {
request . model = model . clone ();
match provider . complete ( request . clone ()) . await {
Ok ( response ) => return Ok ( response ),
Err ( e ) if is_retryable ( & e ) => {
log :: warn! ( "Provider {} failed (retryable), trying next: {}" , provider . provider_name (), e );
last_error = Some ( e );
continue ;
}
Err ( e ) => return Err ( e ), // Non-retryable errors propagate immediately
}
}
Err ( last_error . unwrap_or_else ( || anyhow :: anyhow! ( "No providers configured" )))
}
}
Retryable errors:
HTTP 429 (rate limit)
HTTP 500, 502, 503, 529 (server errors)
Billing/quota exhaustion
Connection failures
Priority order : Anthropic → OpenAI → Ollama
Every LLM call is logged to the agent_tool_calls table with the actual provider and model used, so you always know which provider served each request.
Code Index (tree-sitter)
The CodeIndex is the foundation for all code analysis stages. It provides:
// src/index/symbols.rs
pub fn extract_symbols ( content : & str , language : & str , file : & str ) -> Vec < Symbol > {
let mut parser = tree_sitter :: Parser :: new ();
parser . set_language ( match language {
"rust" => tree_sitter_rust :: language (),
"python" => tree_sitter_python :: language (),
"javascript" => tree_sitter_javascript :: language (),
"typescript" => tree_sitter_typescript :: language_typescript (),
"go" => tree_sitter_go :: language (),
"java" => tree_sitter_java :: language (),
_ => return vec! [],
}) . unwrap ();
let tree = parser . parse ( content , None ) . unwrap ();
let root = tree . root_node ();
let mut symbols = Vec :: new ();
walk_ast ( & root , content , file , & mut symbols );
symbols
}
Data structures:
pub struct CodeIndex {
pub root : PathBuf ,
pub files : HashMap < String , IndexedFile >,
pub symbols : SymbolIndex , // Function/class lookup
pub callgraph : CallGraph , // "Who calls this?"
pub deps : DependencyGraph , // Import/require resolution
pub search : SearchIndex , // Full-text regex search
}
pub struct Symbol {
pub name : String ,
pub kind : String , // "function", "class", "method", etc.
pub file : String ,
pub line : usize ,
pub is_public : bool ,
pub is_entry_point : bool , // main(), route handlers
pub calls : Vec < String >, // Functions called by this symbol
}
Call graph queries:
let callers = code_index . callgraph . get_callers ( "execute_query" );
// Returns: [("src/routes/api.rs", 42), ("src/services/user.rs", 103)]
This powers the Hunt agent’s get_callers tool and Víðarr’s reachability analysis.
Database Layer (PostgreSQL)
All scan metadata, findings, and artifacts are stored in PostgreSQL.
Schema highlights:
users : Authentication, session management
repos : Repository metadata, OAuth connections
scans : Scan lifecycle (status, timestamps, finding counts)
scan_stages : Per-stage tracking (ingest, tyr, hunt, etc.)
scan_events : Granular event log for UI timeline
findings : Vulnerabilities with severity, confidence, patches
patches : Suggested remediation diffs (unified diff format)
threat_models : Tyr’s structured output (boundaries, surfaces, data flows)
file_snapshots : Content hashes and metadata for scanned files
agent_tool_calls : LLM request/response logging with token usage
api_keys : Encrypted user-provided AI provider keys
oauth_connections : GitHub/GitLab access tokens
Entity relationships:
Schema DSL:
Heimdall uses a custom schema definition DSL that generates idempotent DDL:
// src/db/schema/definition.rs
Schema :: new ()
. extension ( "pgcrypto" )
. table ( "scans" , | t | {
t . uuid_pk ( "id" );
t . uuid ( "repo_id" ) . not_null () . references ( "repos" , "id" );
t . text ( "status" ) . not_null () . default_str ( "'queued'" );
t . text ( "commit_sha" );
t . integer ( "finding_count" ) . not_null () . default_int ( 0 );
t . integer ( "critical_count" ) . not_null () . default_int ( 0 );
t . timestamps ();
})
. index ( "idx_scans_repo_id" , "scans" , & [ "repo_id" ])
. build ()
On startup, Heimdall runs:
CREATE TABLE IF NOT EXISTS scans (...);
CREATE INDEX IF NOT EXISTS idx_scans_repo_id ON scans (repo_id);
No migration files or tracking tables needed — safe to run repeatedly.
Background Worker
The ScanWorker polls the database every 5 seconds for queued scans and executes them in the background.
// src/worker.rs
pub struct ScanWorker {
app_state : Arc < AppState >,
poll_interval : Duration ,
stale_timeout_mins : i32 ,
}
impl ScanWorker {
pub async fn run ( self : Arc < Self >) {
let mut interval = tokio :: time :: interval ( self . poll_interval);
loop {
interval . tick () . await ;
// Mark stale scans as failed
let _ = self . app_state . db . mark_stale_scans_as_failed ( self . stale_timeout_mins) . await ;
// Find next queued scan
let Some ( scan ) = self . app_state . db . get_next_queued_scan () . await . ok () . flatten () else {
continue ;
};
// Execute pipeline
let state = Arc :: clone ( & self . app_state);
tokio :: spawn ( async move {
let result = execute_scan ( & state , scan . id) . await ;
if let Err ( e ) = result {
log :: error! ( "Scan {} failed: {}" , scan . id, e );
}
});
}
}
}
Concurrency:
One worker per server instance
Scans run sequentially (next scan starts after previous completes)
Within a scan, Hunt agents run in parallel via tokio::spawn
Set WORKER_ENABLED=false to disable the background worker (useful for horizontal scaling with dedicated worker nodes).
Tech Stack
Backend
Language Rust 2024 edition Memory safety, performance, fearless concurrency
Web Framework Actix-web 4 High-performance async HTTP server
Database PostgreSQL 14+ Schema applied via custom DSL (no migrations)
Async Runtime Tokio Multi-threaded async executor
Code Analysis
AST Parsing tree-sitter Incremental parsing for 6 languages (Rust, Python, JS, TS, Go, Java)
Pattern Matching Regex + tree-sitter queries Fast static analysis rules
AI & Orchestration
AI Providers Claude, OpenAI, Ollama Automatic fallback between providers
Sandbox Docker (bollard SDK) Isolated PoC execution in containers
Frontend
HTML Rendering minijinja Server-side templates (Jinja2 syntax)
Dynamic UI HTMX + Tailwind CSS HTML-over-the-wire, no JavaScript build step
Security
Authentication Argon2id + session cookies CSRF double-submit cookie protection
Encryption AES-256-GCM Encrypts stored API keys at rest
Data Flow: Complete Scan Lifecycle
Deployment Architecture
Single-Server Deployment (Recommended for getting started)
┌─────────────────────────────────────┐
│ Heimdall Server │
│ ┌──────────────────────────────┐ │
│ │ Actix-web (Port 8080) │ │
│ └──────────────────────────────┘ │
│ ┌──────────────────────────────┐ │
│ │ Background Worker │ │
│ └──────────────────────────────┘ │
│ ┌──────────────────────────────┐ │
│ │ PostgreSQL Client │ │
│ └──────────────────────────────┘ │
│ ┌──────────────────────────────┐ │
│ │ Docker Client (Garmr) │ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────┘
│ │
├───────────────┤
▼ ▼
PostgreSQL Docker Engine
(External) (Local/Remote)
Horizontal Scaling (For high load)
┌──────────────┐
│ Nginx / │
│ Load Balancer│
└──────┬───────┘
│
┌──────────┼──────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Web 1 │ │ Web 2 │ │ Web N │
│ (API) │ │ (API) │ │ (API) │
│ WORKER │ │ WORKER │ │ WORKER │
│ DISABLED│ │ DISABLED│ │ DISABLED│
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└───────────┼───────────┘
▼
┌──────────────┐
│ PostgreSQL │
│ (Shared) │
└──────────────┘
▲
│
┌───────┴────────┐
▼ ▼
┌─────────┐ ┌─────────┐
│Worker 1 │ │Worker N │
│(Scans) │ │(Scans) │
└─────────┘ └─────────┘
Configuration :
Web nodes: WORKER_ENABLED=false
Worker nodes: WORKER_ENABLED=true
Shared PostgreSQL for coordination
Security Considerations
User-provided AI keys are encrypted with AES-256-GCM using a server-side encryption key: ENCRYPTION_KEY = $( openssl rand -hex 32 )
Keys are never logged or exposed in API responses.
GitHub/GitLab access tokens are:
Encrypted with ENCRYPTION_KEY before storage
Decrypted only during repository cloning
Embedded in clone URLs as https://oauth2:{token}@github.com/...
Never persisted in cloned repositories (stripped after clone)
Sandbox Isolation (Garmr)
PoC scripts run in isolated Docker containers with:
No network access (--network none)
Read-only repository mount
Non-root user (nobody)
CPU and memory limits
30-second timeout
Containers are destroyed immediately after execution.
All state-changing requests require a valid CSRF token via double-submit cookie pattern: // Middleware validates:
// 1. CSRF cookie is present
// 2. X-CSRF-Token header matches cookie
// 3. Token has valid signature
Argon2id password hashing (work factor: 19, memory: 64MB)
Session tokens stored as SHA-256 hashes
Automatic expiration after 7 days
Background cleanup job runs hourly
Environment Variables Reference
Server
APP_HOST = 0.0.0.0 # Bind address
APP_PORT = 8080 # Listen port
CORS_ALLOWED_ORIGIN = http://localhost:8080
Database
DATABASE_URL = postgres://heimdall:heimdall@localhost:5432/heimdall
AI Providers (at least one required)
ANTHROPIC_API_KEY = sk-ant-... # Claude (recommended)
OPENAI_API_KEY = sk-... # GPT-4o
OLLAMA_URL = http://localhost:11434 # Local models
DEFAULT_AI_MODEL = claude-sonnet-4-20250514
Security
ENCRYPTION_KEY = $( openssl rand -hex 32 ) # AES-256-GCM key (64 hex chars)
WEBHOOK_SECRET = $( openssl rand -hex 20 ) # GitHub/GitLab webhook verification
Worker
WORKER_ENABLED = true # Enable background scan worker
WORKER_POLL_INTERVAL_SECS = 5 # Poll frequency
WORKER_STALE_TIMEOUT_MINS = 10 # Mark stuck scans as failed
Logging
RUST_LOG = info, heimdall = debug # Log level (env_logger syntax)
Database Connection Pool PgPoolOptions :: new ()
. max_connections ( 5 ) // Default
. connect ( & database_url )
Increase for high concurrency
Hunt Agent Parallelism for surface in threat_model . surfaces {
tokio :: spawn ( async move {
agent . investigate ( & surface , ... )
});
}
Automatic parallel execution
SSE Buffer Size broadcast :: channel ( 100 ) // Default
Increase if clients lag behind
Docker Resource Limits nano_cpus : 1_000_000_000 , // 1 CPU
memory : 512 * 1024 * 1024 , // 512MB
timeout : 30s
Tune for PoC complexity
Directory Structure
heimdall/
├── src/
│ ├── main.rs # Entry point, server startup
│ ├── lib.rs # Public module exports
│ ├── config.rs # Environment config
│ ├── state.rs # AppState (shared across handlers)
│ ├── auth/ # Password hashing, session tokens
│ ├── crypto.rs # AES-256-GCM encrypt/decrypt
│ ├── sse.rs # Server-Sent Events broadcaster
│ ├── db/
│ │ ├── mod.rs # DatabaseOperations (all queries)
│ │ └── schema/ # Schema DSL + DDL generators
│ ├── models/ # Domain types, API wrappers
│ ├── routes/
│ │ ├── mod.rs # Route registration
│ │ ├── pages.rs # HTML page handlers
│ │ ├── auth.rs # Login, register, OAuth
│ │ ├── repos.rs # Repository CRUD + scan trigger
│ │ ├── scans.rs # Scan queries + SSE stream
│ │ ├── findings.rs # Finding CRUD + events
│ │ ├── settings.rs # User settings + API keys
│ │ └── webhooks.rs # GitHub/GitLab webhooks
│ ├── middleware/
│ │ ├── auth.rs # Session auth middleware
│ │ └── csrf.rs # CSRF double-submit cookie
│ ├── pipeline/
│ │ ├── mod.rs # ScanPipeline orchestrator
│ │ ├── ingest/ # Stage 1: Clone + index
│ │ ├── tyr/ # Stage 2: Threat modeling
│ │ ├── static_analysis/ # Stage 3: Pattern rules
│ │ ├── hunt/ # Stage 4: Agentic discovery
│ │ ├── vidarr/ # Stage 5: Adversarial verification
│ │ ├── garmr/ # Stage 6: Sandbox validation
│ │ └── report/ # Stage 7: Patches + ranking
│ ├── worker.rs # Background scan worker
│ ├── integrations/
│ │ └── issues.rs # GitHub/GitLab issue creation
│ ├── ai/
│ │ ├── mod.rs # ModelProvider trait + builder
│ │ ├── types.rs # Request/response types
│ │ ├── claude.rs # Anthropic provider
│ │ ├── openai.rs # OpenAI provider
│ │ ├── ollama.rs # Ollama provider
│ │ └── fallback.rs # FallbackProvider (auto-retry)
│ └── index/
│ ├── mod.rs # CodeIndex (unified)
│ ├── symbols.rs # tree-sitter symbol extraction
│ ├── callgraph.rs # Call graph
│ ├── deps.rs # Dependency graph
│ └── search.rs # Full-text search
├── templates/
│ ├── base.html # Master layout
│ ├── pages/ # Full page templates
│ └── partials/ # Reusable components
├── static/ # CSS, JS, images
├── migrations/active/ # Generated DDL (optional)
├── tests/ # Integration tests
├── Cargo.toml
├── Dockerfile
├── docker-compose.yml # Production stack
└── docker-compose.dev.yml # Dev database only
What’s Next?
Quick Start Get Heimdall running locally in 5 minutes
Configuration Configure AI providers, OAuth, and security settings
API Reference REST API endpoints for programmatic access
Deployment Production deployment with Docker, Nginx, and horizontal scaling