System Architecture

Overview

IronClaw is built on a modular architecture that separates concerns while maintaining tight security boundaries. The system orchestrates agent reasoning, tool execution, multi-channel communication, and persistent memory through a set of core components.

Architecture Diagram

Core Components

Agent Loop

The central orchestrator that coordinates all system activity.

pub struct Agent {
    config: AgentConfig,
    deps: AgentDeps,
    channels: Arc<ChannelManager>,
    context_manager: Arc<ContextManager>,
    scheduler: Arc<Scheduler>,
    router: Router,
    session_manager: Arc<SessionManager>,
}

Responsibilities:

Route incoming messages from channels
Classify user intent (command vs query vs task)
Delegate to appropriate handlers
Coordinate session and thread management
Trigger background systems (heartbeat, routines, self-repair)

Router

Classifies incoming messages to determine handling strategy.

pub enum MessageIntent {
    Command,      // System commands (/help, /status)
    Query,        // Information requests
    Task,         // Work requiring tools/planning
    Conversation, // Chat, general interaction
}

Intent Classification:

Commands: Direct system operations (/quit, /undo, /status)
Queries: Information retrieval from memory or knowledge
Tasks: Complex work requiring planning and tool execution
Conversation: General chat and interaction

Scheduler

Manages parallel job execution with priorities and resource limits.

pub struct Scheduler {
    config: AgentConfig,
    context_manager: Arc<ContextManager>,
    jobs: Arc<RwLock<HashMap<Uuid, ScheduledJob>>>,
    subtasks: Arc<RwLock<HashMap<Uuid, ScheduledSubtask>>>,
}

Key Features:

Parallel job execution (configurable limit)
Per-job worker isolation
Subtask spawning for parallel tool execution
Automatic cleanup on completion
Job state tracking (pending, in_progress, completed, failed, stuck)

Worker

Executes individual jobs with LLM reasoning and tool calls.

pub struct Worker {
    job_id: Uuid,
    deps: WorkerDeps,
}

Execution Flow:

Planning (optional): Generate action plan with LLM
Tool Selection: Choose tools based on context
Parallel Execution: Run independent tools concurrently
Result Processing: Sanitize output, update context
Iteration: Loop until job complete or max iterations
Completion: Mark job as completed/failed/stuck

Workers support both planning mode (generate upfront plan) and direct selection (iterative tool selection). Planning mode is more efficient for complex multi-step tasks.

Routines Engine

Background automation for scheduled and reactive tasks.

pub enum Trigger {
    Cron(String),              // Schedule: "0 9 * * MON"
    Event { pattern: String }, // Regex: "deploy.*failed"
    Webhook { path: String },  // HTTP: /webhook/deploy
}

Routine Types:

Cron: Time-based schedules (daily reports, periodic checks)
Event: Message pattern matching (alert on errors)
Webhook: HTTP endpoint triggers (CI/CD integration)

Use Cases:

Daily standup summaries
Alert monitoring and triage
Periodic health checks
Automated reporting

Orchestrator

Manages Docker sandbox containers for isolated code execution.

pub struct ContainerJobManager {
    // Container lifecycle management
    // Per-job authentication tokens
    // LLM proxy for worker containers
    // Credential injection boundary
}

Security Model:

Per-job bearer tokens (ephemeral, in-memory only)
Network-isolated containers
Resource limits (CPU, memory, timeout)
Credential injection at orchestrator boundary
No direct database access from containers

Worker/Orchestrator Pattern:

┌─────────────────────────────────────────────────────────┐
│                     Orchestrator                        │
│  - HTTP API (:50051)                                    │
│  - Token validation                                     │
│  - LLM proxy                                            │
│  - Credential injection                                 │
└────────────────────┬────────────────────────────────────┘
                     │ HTTP + Bearer Token
                     ▼
┌─────────────────────────────────────────────────────────┐
│              Docker Container (Worker)                  │
│  - Isolated filesystem                                  │
│  - Limited tools (shell, file ops)                      │
│  - No direct secrets access                             │
└─────────────────────────────────────────────────────────┘

Worker containers have no direct access to secrets. All credentials are injected by the orchestrator at request time after token validation.

Data Flow

Message Processing

Job Lifecycle

Self-Repair System

Automatic detection and recovery of stuck operations. Detection:

Jobs stuck in InProgress beyond threshold
Tools with high failure rates
Unresponsive worker processes

Recovery Strategies:

Stuck Job Recovery

Detect job stuck > threshold (default 5min)
Analyze context and last action
Attempt recovery:
- Retry failed tool
- Restart worker with fresh context
- Escalate to manual intervention
Notify user of outcome

Broken Tool Recovery

Track tool failure rates
Identify consistently failing tools
Recovery options:
- Clear tool cache
- Rebuild WASM tool
- Disable tool temporarily
- Suggest alternative tools

Context Management

Each job maintains isolated context for safe parallel execution.

pub struct JobContext {
    pub id: Uuid,
    pub title: String,
    pub description: String,
    pub user_id: String,
    pub state: JobState,
    pub created_at: DateTime<Utc>,
    pub metadata: serde_json::Value,
}

Context Isolation:

Each job has independent memory
No shared mutable state between jobs
Tool execution scoped to job context
LLM history isolated per job

Context Compaction: When conversation history grows large:

Detect: Monitor token count per thread
Summarize: Use LLM to summarize old turns
Preserve: Keep recent turns intact
Replace: Swap old turns with summary
Continue: Resume conversation with more tokens

Compaction triggers automatically at 75% of max context window. Recent turns (last 10) are always preserved.

Session Management

Multi-threaded conversations with undo/redo support. Features:

Multiple concurrent threads per user
Turn-based checkpointing
Undo/redo with state restoration
Session persistence to database
Automatic pruning of stale sessions

Turn Structure:

pub struct Turn {
    pub user_input: String,
    pub assistant_response: String,
    pub tool_calls: Vec<ToolCall>,
    pub state: TurnState,
    pub created_at: DateTime<Utc>,
}

Performance Characteristics

Parallel Tool Execution

Tools with no dependencies execute concurrently:

// Sequential (slow)
let result1 = execute_tool("api_call_1").await;
let result2 = execute_tool("api_call_2").await;
let result3 = execute_tool("api_call_3").await;
// Total: ~600ms

// Parallel (fast)
let results = execute_tools_parallel([
    "api_call_1",
    "api_call_2", 
    "api_call_3"
]).await;
// Total: ~200ms

The worker automatically detects independent tool calls and executes them in parallel using a JoinSet.

Resource Limits

Resource	Default Limit	Configurable
Max parallel jobs	10	Yes (`max_parallel_jobs`)
Job timeout	30 minutes	Yes (`job_timeout`)
Max iterations	50	Yes (per-job metadata)
Stuck threshold	5 minutes	Yes (`stuck_threshold`)
Session idle timeout	1 hour	Yes (`session_idle_timeout`)

Next Steps

Security Model

Learn about defense-in-depth security layers

Channel System

Multi-channel communication architecture

Tool System

Extensible tool system and WASM sandbox

Workspace & Memory

Persistent memory and hybrid search

Getting Started

Core Concepts

CLI Reference

Channels

Tools & Extensions

Advanced

Security

Development

System Architecture

Overview

Architecture Diagram

Core Components

Agent Loop

Router

Scheduler

Worker

Routines Engine

Orchestrator

Data Flow

Message Processing

Job Lifecycle

Self-Repair System

Context Management

Session Management

Performance Characteristics

Parallel Tool Execution

Resource Limits

Next Steps

Security Model

Channel System

Tool System

Workspace & Memory

Build docs developers (and LLMs) love

Getting Started

Core Concepts

CLI Reference

Channels

Tools & Extensions

Advanced

Security

Development

Documentation Index

​Overview

​Architecture Diagram

​Core Components

​Agent Loop

​Router

​Scheduler

​Worker

​Routines Engine

​Orchestrator

​Data Flow

​Message Processing

​Job Lifecycle

​Self-Repair System

​Context Management

​Session Management

​Performance Characteristics

​Parallel Tool Execution

​Resource Limits

​Next Steps

Security Model

Channel System

Tool System

Workspace & Memory

Build docs developers (and LLMs) love

Overview

Architecture Diagram

Core Components

Agent Loop

Router

Scheduler

Worker

Routines Engine

Orchestrator

Data Flow

Message Processing

Job Lifecycle

Self-Repair System

Context Management

Session Management

Performance Characteristics

Parallel Tool Execution

Resource Limits

Next Steps