Orchestration Loop

The Dispatch Daemon is the heart of Stoneforge’s orchestration system. It runs continuous polling loops to assign tasks, deliver messages, and trigger steward workflows.

Overview

The daemon coordinates all agent activity without manual intervention:

The daemon runs as a background service on the smithy server. Start it with sf daemon start.

Polling Loops

The daemon executes five main polling loops every 5 seconds (configurable):

1. Worker Availability Polling

Purpose: Assign unassigned tasks to available ephemeral workers

Find Available Workers

Query for ephemeral workers with:

No active session
Not rate-limited
Below pool capacity

const availableWorkers = agents.filter(a => 
  a.metadata.agentRole === 'worker' &&
  a.metadata.workerMode === 'ephemeral' &&
  a.metadata.sessionStatus === 'idle'
);

Query Ready Tasks

Get highest priority tasks that are:

Status: OPEN
No assignee
Not blocked by dependencies

const readyTasks = await api.ready();
const sortedTasks = sortByEffectivePriority(readyTasks);

Assign and Dispatch

For each available worker:

Assign highest priority task
Create or reuse worktree
Send dispatch message to worker’s inbox
Spawn worker session in worktree

await api.update(taskId, { 
  assignee: workerId,
  status: 'in_progress',
});

await dispatchService.dispatchTask(taskId, workerId);

Workers are spawned inside their worktree directory for full isolation.

2. Inbox Polling

Purpose: Route messages and trigger agent sessions when needed

Ephemeral Workers
Persistent Workers
Directors
Stewards

Message Routing Logic:

Poll for unread messages in worker inbox
For each message:
- Dispatch message? Mark as read (spawn handled by worker polling)
- Has active session? Leave unread (session will handle it)
- Idle with non-dispatch messages? Leave unread to accumulate
If accumulated messages exist, spawn triage session

Triage Session:

Groups messages by channel
Spawns temporary session to process batch
Uses message-triage prompt template
Agent responds to messages then exits

Message Forwarding:

Poll for unread messages
For each message:
- Has active session? Forward to PTY as user input
- No session? Leave unread until session starts
Mark forwarded messages as read

// Forward message to worker's PTY
await session.write(
  `[MESSAGE FROM ${senderId}]: ${content}\n`
);

Directors are skipped by inbox polling to prevent interrupting strategic planning.Directors check their inbox explicitly:

sf inbox <director-id>

3. Steward Trigger Polling

Purpose: Activate steward workflows based on triggers

// Event trigger fires when matching event occurs
const trigger: EventTrigger = {
  type: 'event',
  event: 'task_review_ready',
  condition: 'task.priority <= 2', // Optional filter
};

// Daemon checks for triggered events
if (eventMatches(trigger, event)) {
  await spawnStewardWorkflow(stewardId, event);
}

Steward Workflow Execution

Trigger fires (event or cron)
Create workflow from playbook template
Workflow picked up by Workflow Task Polling
Steward session spawned to execute

4. Workflow Task Polling

Purpose: Assign workflow tasks to available stewards

Find Incomplete Workflows

Query for workflows with:

Status: RUNNING or PENDING
No assigned steward
Current step not completed

Find Available Stewards

Get stewards matching workflow requirements:

No active session
Correct steward focus
Not rate-limited

Assign and Execute

For each available steward:

Assign workflow task
Send dispatch message
Spawn steward session with workflow context

5. Orphan Recovery Polling

Purpose: Recover workers with assigned tasks but no active session after restart

When the orchestrator server restarts, agent sessions terminate but task assignments persist. Orphan recovery re-spawns workers to continue their work.

Detect Orphaned Assignments

Find ephemeral workers with:

No active session (sessionStatus: ‘idle’)
Assigned tasks (OPEN or IN_PROGRESS)
Session terminated by restart

const orphans = workers.filter(w => 
  w.metadata.sessionStatus === 'idle' &&
  hasAssignedTasks(w.id)
);

Attempt Resume

Try to resume previous session:

Check for sessionId in task metadata
If found, attempt provider session resume
Reuse existing worktree
Inject resume context explaining restart

await spawner.spawn({
  agentId: workerId,
  resumeSessionId: task.metadata.orchestrator.lastSessionId,
  workingDirectory: task.metadata.orchestrator.handoffWorktree,
});

Fallback to Fresh Spawn

If resume fails or no session ID:

Spawn fresh session
Send full task prompt
Continue from existing worktree
Preserve handoff history

Orphan recovery runs at daemon startup and at the start of each poll cycle.

End-to-End Flow

Here’s a complete example of a task flowing through the system:

Configuration

Customize daemon behavior via configuration:

const daemon = new DispatchDaemon({
  // Poll interval
  pollIntervalMs: 5000, // 5 seconds (default)
  
  // Enable/disable specific loops
  workerAvailabilityPollEnabled: true,
  inboxPollEnabled: true,
  stewardTriggerPollEnabled: true,
  workflowTaskPollEnabled: true,
  orphanRecoveryEnabled: true,
  
  // Recovery settings
  maxResumeAttemptsBeforeRecovery: 3,
  maxSessionDurationMs: 0, // Disabled by default
  maxStewardSessionDurationMs: 1800000, // 30 minutes
  
  // Reconciliation
  closedUnmergedReconciliationEnabled: true,
  closedUnmergedGracePeriodMs: 120000, // 2 minutes
  stuckMergeRecoveryEnabled: true,
  stuckMergeRecoveryGracePeriodMs: 600000, // 10 minutes
});

Configuration Options

Option	Default	Description
`pollIntervalMs`	5000	Time between poll cycles
`workerAvailabilityPollEnabled`	true	Enable task assignment
`inboxPollEnabled`	true	Enable message routing
`stewardTriggerPollEnabled`	true	Enable steward triggers
`workflowTaskPollEnabled`	true	Enable workflow dispatch
`orphanRecoveryEnabled`	true	Enable restart recovery
`maxResumeAttemptsBeforeRecovery`	3	Resume limit before steward
`maxSessionDurationMs`	0	Worker timeout (0=disabled)
`maxStewardSessionDurationMs`	1800000	Steward timeout (30 min)

Starting and Stopping

CLI

# Start daemon
sf daemon start

# Check status
sf daemon status

# Stop daemon
sf daemon stop

Programmatically

import { DispatchDaemon } from '@stoneforge/smithy';

const daemon = new DispatchDaemon({
  api: quarryAPI,
  registry: agentRegistry,
  sessionManager,
  dispatchService,
  worktreeManager,
  // ... other dependencies
});

// Start daemon
await daemon.start();

// Stop daemon
await daemon.stop();

// Check if running
const isRunning = daemon.isRunning();

Monitoring

Poll Results

The daemon emits poll results after each cycle:

daemon.on('poll', (result: PollResult) => {
  console.log(`${result.pollType}: processed ${result.processed} in ${result.durationMs}ms`);
  
  if (result.errors > 0) {
    console.error(`Errors: ${result.errorMessages.join(', ')}`);
  }
});

Health Checks

// Get daemon health status
const health = await daemonService.getHealth();

// Health response
{
  status: 'running' | 'stopped',
  uptime: 123456, // milliseconds
  pollCycleCount: 42,
  lastPollResults: {
    'worker-availability': { processed: 3, errors: 0 },
    'inbox': { processed: 5, errors: 0 },
    'steward-trigger': { processed: 1, errors: 0 },
    'workflow-task': { processed: 0, errors: 0 },
  },
}

Advanced Features

Rate Limit Handling

The daemon tracks rate-limited workers:

// Rate limit detected from session error
if (isRateLimitMessage(error)) {
  const resetTime = parseRateLimitResetTime(error);
  rateLimitTracker.set(workerId, resetTime);
}

// Skip rate-limited workers in polling
const availableWorkers = workers.filter(w => 
  !rateLimitTracker.isRateLimited(w.id)
);

Session Timeout

Automatically terminate sessions that run too long:

// Configure timeouts
const daemon = new DispatchDaemon({
  maxSessionDurationMs: 3600000, // 1 hour for workers
  maxStewardSessionDurationMs: 1800000, // 30 minutes for stewards
});

// Daemon checks session duration each poll cycle
if (sessionDuration > maxDuration) {
  await sessionManager.terminate(sessionId);
}

Plan Auto-Completion

Automatically close plans when all tasks are done:

// Enabled by default
const daemon = new DispatchDaemon({
  planAutoCompleteEnabled: true,
});

// Daemon detects completed plans
const activePlans = await api.list({ 
  type: 'plan',
  status: 'active',
});

for (const plan of activePlans) {
  if (allTasksCompleted(plan)) {
    await api.updateStatus(plan.id, {
      status: 'completed',
    });
  }
}

Stuck Merge Recovery

Recover tasks stuck in merge states:

const daemon = new DispatchDaemon({
  stuckMergeRecoveryEnabled: true,
  stuckMergeRecoveryGracePeriodMs: 600000, // 10 minutes
});

// Daemon finds tasks stuck in 'merging' or 'testing'
// for longer than grace period and resets to 'pending'

Next Steps

Agent Roles

Learn about the agents being orchestrated

Task Management

Understand task lifecycle and statuses

Dependencies

See how blocking affects dispatch

Workflows

Build multi-step steward workflows

Get Started

Core Concepts

Guides

Overview

Polling Loops

1. Worker Availability Polling

2. Inbox Polling

3. Steward Trigger Polling

4. Workflow Task Polling

5. Orphan Recovery Polling

End-to-End Flow

Configuration

Starting and Stopping

CLI

Programmatically

Monitoring

Poll Results

Health Checks

Advanced Features

Rate Limit Handling

Session Timeout

Plan Auto-Completion

Stuck Merge Recovery

Next Steps

Agent Roles

Task Management

Dependencies

Workflows

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​Overview

​Polling Loops

​1. Worker Availability Polling

​2. Inbox Polling

​3. Steward Trigger Polling

​4. Workflow Task Polling

​5. Orphan Recovery Polling

​End-to-End Flow

​Configuration

​Starting and Stopping

​CLI

​Programmatically

​Monitoring

​Poll Results

​Health Checks

​Advanced Features

​Rate Limit Handling

​Session Timeout

​Plan Auto-Completion

​Stuck Merge Recovery

​Next Steps

Agent Roles

Task Management

Dependencies

Workflows

Build docs developers (and LLMs) love

Overview

Polling Loops

1. Worker Availability Polling

2. Inbox Polling

3. Steward Trigger Polling

4. Workflow Task Polling

5. Orphan Recovery Polling

End-to-End Flow

Configuration

Starting and Stopping

CLI

Programmatically

Monitoring

Poll Results

Health Checks

Advanced Features

Rate Limit Handling

Session Timeout

Plan Auto-Completion

Stuck Merge Recovery

Next Steps