Container Orchestration

Overview

IronClaw’s container orchestration extends the sandbox system to support persistent Docker containers running full agent worker processes. Unlike ephemeral command containers, orchestrated containers maintain state and communicate with the main agent via an internal HTTP API.

Architecture

┌───────────────────────────────────────────────┐
│              Orchestrator                       │
│                                                 │
│  Internal API (default :50051, configurable)    │
│    POST /worker/{id}/llm/complete               │
│    POST /worker/{id}/llm/complete_with_tools    │
│    GET  /worker/{id}/job                        │
│    GET  /worker/{id}/credentials                │
│    POST /worker/{id}/status                     │
│    POST /worker/{id}/complete                   │
│                                                 │
│  ContainerJobManager                            │
│    create_job() -> container + token             │
│    stop_job()                                    │
│    list_jobs()                                   │
│                                                 │
│  TokenStore                                     │
│    per-job bearer tokens (in-memory only)       │
│    per-job credential grants (in-memory only)   │
└───────────────────────────────────────────────┘
           │                              │
           ▼                              ▼
  ┌──────────────────────┐   ┌──────────────────────┐
  │  Worker Container    │   │  Claude Container  │
  │                      │   │                    │
  │  ironclaw worker    │   │  claude-bridge    │
  │  + full tools       │   │  + claude CLI     │
  │  + LLM via proxy    │   │  + native API     │
  └──────────────────────┘   └──────────────────────┘

Job Modes

Worker Mode

Standard IronClaw worker with proxied LLM calls:

pub enum JobMode {
    Worker,        // Full IronClaw agent loop
    ClaudeCode,    // Claude CLI bridge
}

Worker Container:

Runs ironclaw worker command
LLM requests proxied to orchestrator
Full tool access (Bash, Read, Write, etc.)
Multi-turn agent loop
Custom tools via WASM/MCP

Use Cases:

Long-running background jobs
Isolated project work
Multi-step workflows
Testing in clean environment

Claude Code Mode

Bridge to the official Claude CLI:

# Container runs:
ironclaw claude-bridge \
  --job-id <uuid> \
  --orchestrator-url http://host:50051 \
  --max-turns 50 \
  --model sonnet

Claude Container:

Spawns claude CLI directly
Native Claude Code tool use
Anthropic API or OAuth authentication
Tool allowlist for security
Automatic session management

Use Cases:

Use Claude’s native computer use
Access Claude-specific features
Compare IronClaw vs Claude behavior
Development and testing

Container Job Manager

Creating Jobs

use ironclaw::orchestrator::{
    ContainerJobManager, JobMode, ContainerJobConfig,
    CredentialGrant,
};
use uuid::Uuid;

let config = ContainerJobConfig {
    image: "ironclaw-worker:latest".to_string(),
    memory_limit_mb: 2048,
    cpu_shares: 1024,
    orchestrator_port: 50051,
    ..Default::default()
};

let manager = ContainerJobManager::new(config, token_store);

let job_id = Uuid::new_v4();
let token = manager.create_job(
    job_id,
    "Build and test the project",
    Some(project_dir),
    JobMode::Worker,
    vec![],  // credential grants
).await?;

println!("Job {} started with token: {}", job_id, token);

Stopping Jobs

manager.stop_job(job_id).await?;

Stopping a job:

Stops the container (10 second grace period)
Removes the container
Revokes the auth token
Updates job state to Stopped

Listing Jobs

let jobs = manager.list_jobs().await;
for job in jobs {
    println!(
        "{}: {} ({})",
        job.job_id,
        job.task_description,
        job.state
    );
}

Authentication

Bearer Token System

Each job gets a unique bearer token:

pub struct TokenStore {
    tokens: Arc<RwLock<HashMap<Uuid, String>>>,
    grants: Arc<RwLock<HashMap<Uuid, Vec<CredentialGrant>>>>,
}

impl TokenStore {
    pub async fn create_token(&self, job_id: Uuid) -> String {
        let token = format!("irc_{}", Uuid::new_v4());
        self.tokens.write().await.insert(job_id, token.clone());
        token
    }
    
    pub async fn validate(&self, token: &str) -> Option<Uuid> {
        let tokens = self.tokens.read().await;
        tokens.iter()
            .find(|(_, t)| *t == token)
            .map(|(id, _)| *id)
    }
}

Security Properties:

Tokens are never logged or serialized
Stored in-memory only (lost on restart)
Automatically revoked when job completes
Single token per job

Credential Grants

Jobs can be granted access to specific credentials:

pub struct CredentialGrant {
    pub name: String,           // e.g., "github"
    pub credential_type: String, // e.g., "api_token"
    pub value: String,          // Actual secret
}

Workers request credentials via the orchestrator API:

GET /worker/{job_id}/credentials
Authorization: Bearer irc_...

Response:

{
  "github": "ghp_...",
  "npm": "npm_..."
}

Orchestrator API

Endpoints

POST /worker//llm/complete

Proxy LLM completion request:

POST /worker/{job_id}/llm/complete
Authorization: Bearer irc_...
Content-Type: application/json

{
  "messages": [
    {"role": "user", "content": "What is 2+2?"}
  ],
  "max_tokens": 4096
}

Response:

{
  "content": "2 + 2 = 4",
  "finish_reason": "stop",
  "input_tokens": 15,
  "output_tokens": 8
}

GET /worker//job

Get job metadata:

GET /worker/{job_id}/job
Authorization: Bearer irc_...

Response:

{
  "job_id": "...",
  "task_description": "Build and test",
  "project_dir": "/workspace/project"
}

POST /worker//status

Update worker status:

POST /worker/{job_id}/status
Authorization: Bearer irc_...
Content-Type: application/json

{
  "message": "Running tests (iteration 5)",
  "iteration": 5
}

POST /worker//complete

Mark job complete:

POST /worker/{job_id}/complete
Authorization: Bearer irc_...
Content-Type: application/json

{
  "success": true,
  "message": "All tests passed. Deployed to staging."
}

Container Configuration

Worker Container

FROM debian:bookworm-slim

# Install dev tools
RUN apt-get update && apt-get install -y \
    git build-essential nodejs npm \
    python3 python3-pip gh

# Install Rust
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y

# Copy IronClaw binary
COPY ironclaw /usr/local/bin/

# Non-root user
USER 1000
WORKDIR /workspace

ENTRYPOINT ["ironclaw"]

Environment Variables (injected by orchestrator):

IRONCLAW_WORKER_TOKEN=irc_...
IRONCLAW_JOB_ID=...
IRONCLAW_ORCHESTRATOR_URL=http://172.17.0.1:50051
IRONCLAW_WORKSPACE=/workspace

Claude Code Container

Same base image plus:

# Install Claude CLI
RUN npm install -g @anthropic-ai/claude-code@latest

# Create Claude config directory
RUN mkdir -p /home/sandbox/.claude

Environment Variables:

# Worker vars
IRONCLAW_WORKER_TOKEN=irc_...
IRONCLAW_JOB_ID=...
IRONCLAW_ORCHESTRATOR_URL=http://172.17.0.1:50051

# Claude auth (one of):
ANTHROPIC_API_KEY=sk-ant-...  # Direct API key
CLAUDE_CODE_OAUTH_TOKEN=...  # OAuth token

# Claude config
CLAUDE_CODE_ALLOWED_TOOLS=bash,read,write,glob

Volume Mounts

Project directories are bind-mounted:

# Host path validation
~/.ironclaw/projects/my-project  # OK
/tmp/untrusted                   # REJECTED
../escape                        # REJECTED (canonicalized first)

Mount Configuration:

let canonical = validate_bind_mount_path(&project_dir, job_id)?;
binds.push(format!("{}:/workspace:rw", canonical.display()));

Resource Limits

pub struct ContainerJobConfig {
    pub memory_limit_mb: u64,     // Default: 2048 (worker)
    pub cpu_shares: u32,           // Default: 1024
    pub claude_code_memory_limit_mb: u64,  // Default: 4096
}

Claude containers get more memory due to heavier node_modules.

Security

let host_config = HostConfig {
    // Capability dropping
    cap_drop: Some(vec!["ALL".to_string()]),
    cap_add: Some(vec!["CHOWN".to_string()]),
    
    // Security options
    security_opt: Some(vec![
        "no-new-privileges:true".to_string()
    ]),
    
    // Tmpfs for ephemeral storage
    tmpfs: Some([
        ("/tmp".to_string(), "size=512M".to_string())
    ].into_iter().collect()),
    
    // Network bridge
    network_mode: Some("bridge".to_string()),
    
    // Host access via host.docker.internal
    extra_hosts: Some(vec![
        "host.docker.internal:host-gateway".to_string()
    ]),
    
    ..Default::default()
};

Lifecycle Management

Container States

pub enum ContainerState {
    Creating,   // Container creation in progress
    Running,    // Worker executing
    Stopped,    // Completed or manually stopped
    Failed,     // Error occurred
}

State Transitions

         create_job()
              │
              ▼
          Creating
              │
              │ (container starts)
              ▼
           Running
              │
              ├────────────> Failed (error)
              │
              ├────────────> Stopped (stop_job())
              │
              │ (complete_job())
              ▼
           Stopped

Cleanup

Automatic cleanup on completion:

pub async fn complete_job(
    &self,
    job_id: Uuid,
    result: CompletionResult,
) -> Result<()> {
    // Store result
    self.containers.write().await
        .get_mut(&job_id)
        .map(|h| h.completion_result = Some(result));
    
    // Stop container
    docker.stop_container(&container_id, timeout).await?;
    docker.remove_container(&container_id, force).await?;
    
    // Revoke token
    self.token_store.revoke(job_id).await;
    
    Ok(())
}

Manual Cleanup

Remove completed job from memory:

manager.cleanup_job(job_id).await;

Configuration

# Orchestrator API port
ORCHESTRATOR_PORT=50051

# Worker container image
WORKER_IMAGE=ironclaw-worker:latest

# Resource limits
WORKER_MEMORY_MB=2048
WORKER_CPU_SHARES=1024

# Claude Code config
CLAUDE_CODE_MEMORY_MB=4096
CLAUDE_CODE_MAX_TURNS=50
CLAUDE_CODE_MODEL=sonnet
CLAUDE_CODE_ALLOWED_TOOLS=bash,read,write

# Authentication (one of):
ANTHROPIC_API_KEY=sk-ant-...  # Direct API
CLAUDE_CODE_OAUTH_TOKEN=...  # OAuth

Building Worker Images

Standard Worker

docker build -f Dockerfile.worker -t ironclaw-worker:latest .

Custom Worker

Add project-specific tools:

FROM ironclaw-worker:latest

# Add custom tools
RUN apt-get update && apt-get install -y \
    terraform kubectl helm

# Add custom scripts
COPY scripts/ /usr/local/bin/

Build:

docker build -t ironclaw-worker:custom .

Configure:

WORKER_IMAGE=ironclaw-worker:custom

Troubleshooting

Container Creation Fails

Error: Failed to create container: image not found

Solution:

# Build worker image
docker build -f Dockerfile.worker -t ironclaw-worker:latest .

# Or pull from registry
docker pull ghcr.io/your-org/ironclaw-worker:latest
docker tag ghcr.io/your-org/ironclaw-worker:latest ironclaw-worker:latest

Orchestrator Connection Failed

Error: Failed to connect to orchestrator

Solutions:

Check orchestrator is running: lsof -i :50051
Verify firewall allows port 50051
For Linux, ensure 172.17.0.1 is accessible from containers
For macOS/Windows, ensure host.docker.internal resolves

Token Validation Failed

Error: Invalid bearer token

Causes:

Orchestrator restarted (tokens are in-memory only)
Job completed and token was revoked
Token expired or corrupted

Solution: Recreate the job.

Volume Mount Rejected

Error: Project directory outside allowed base

Cause: Security validation prevents mounting paths outside ~/.ironclaw/projects/. Solution:

# Move project to allowed location
mv /tmp/project ~/.ironclaw/projects/my-project

# Then create job
ironclaw job create --project ~/.ironclaw/projects/my-project

Source Code

Key files:

src/orchestrator/mod.rs - Module overview
src/orchestrator/job_manager.rs - Container lifecycle management
src/orchestrator/api.rs - HTTP API implementation
src/orchestrator/auth.rs - Token store and credential grants
Dockerfile.worker - Worker container definition

Getting Started

Core Concepts

CLI Reference

Channels

Tools & Extensions

Advanced

Security

Development

Documentation Index

​Overview

​Architecture

​Job Modes

​Worker Mode

​Claude Code Mode

​Container Job Manager

​Creating Jobs

​Stopping Jobs

​Listing Jobs

​Authentication

​Bearer Token System

​Credential Grants

​Orchestrator API

​Endpoints

​POST /worker//llm/complete

​GET /worker//job

​POST /worker//status

​POST /worker//complete

​Container Configuration

​Worker Container

​Claude Code Container

​Volume Mounts

​Resource Limits

​Security

​Lifecycle Management

​Container States

​State Transitions

​Cleanup

​Manual Cleanup

​Configuration

​Building Worker Images

​Standard Worker

​Custom Worker

​Troubleshooting

​Container Creation Fails

​Orchestrator Connection Failed

​Token Validation Failed

​Volume Mount Rejected

​Source Code

Build docs developers (and LLMs) love

Overview

Architecture

Job Modes

Worker Mode

Claude Code Mode

Container Job Manager

Creating Jobs

Stopping Jobs

Listing Jobs

Authentication

Bearer Token System

Credential Grants

Orchestrator API

Endpoints

POST /worker//llm/complete

GET /worker//job

POST /worker//status

POST /worker//complete

Container Configuration

Worker Container

Claude Code Container

Volume Mounts

Resource Limits

Security

Lifecycle Management

Container States

State Transitions

Cleanup

Manual Cleanup

Configuration

Building Worker Images

Standard Worker

Custom Worker

Troubleshooting

Container Creation Fails

Orchestrator Connection Failed

Token Validation Failed

Volume Mount Rejected

Source Code