Suspend and Resume: Human Gates and External Callbacks

Some workflow steps cannot complete autonomously. A deployment pipeline might require human approval before promoting to production. An order-processing workflow might need to wait for a payment callback from an external gateway. A long-running interactive job might need to receive incremental configuration updates from an operator. Aether models all of these patterns through a first-class suspend/resume primitive that lets any executor pause itself and wait for an external signal — without polling, without a separate “wait” task, and without losing the accumulated state from previous rounds.

How Suspension Works

Suspension begins when an executor returns ExecCodeSuspended (integer value 1) from its Execute method. The engine intercepts this exit code and transitions the task to PhaseSuspended rather than any terminal phase:

ExecCodeSucceeded (0) → PhaseSucceeded
ExecCodeSuspended (1) → PhaseSuspended   ← not terminal; task waits for Resume()
ExecCodeFailed    (2) → PhaseFailed
ExecCodeError     (3) → PhaseError
ExecCodeTimeout   (4) → PhaseTimeout

While the task is PhaseSuspended, the engine parks it. The parent DAG scope does not advance — downstream tasks that depend on the suspended task remain in PhaseCreated. The workflow stays alive but makes no progress on that branch until a Resume() call arrives.

Partial Outputs Are Accumulated

Each time the executor returns ExecCodeSuspended, any output parameters it includes are merged into the task’s accumulated outputs using last-writer-wins semantics. New keys are added; existing keys are overwritten by the latest value. This allows the executor to incrementally build up state across multiple suspend-resume cycles.

Triggering Suspension from the Executor

The playground echo executor suspends when the input parameter suspend is true. In a real executor you return ExecCodeSuspended directly:

{
  "dag": {
    "name": "pipeline",
    "tasks": [
      { "name": "prepare", "executor": { "type": "echo" } },
      {
        "name": "await-approval",
        "dependencies": ["prepare"],
        "inputs": {
          "parameters": [
            { "name": "suspend", "value": true },
            {
              "name": "outputs",
              "value": [
                { "name": "approved", "type": "bool", "value": true }
              ]
            }
          ]
        },
        "executor": { "type": "echo" }
      },
      {
        "name": "finalize",
        "dependencies": ["await-approval"],
        "executor": { "type": "echo" }
      }
    ]
  }
}

When await-approval suspends, finalize stays in PhaseCreated. The workflow remains alive — prepare has already succeeded, but finalize cannot run until await-approval is resolved.

Resuming a Suspended Task

Call Engine.Resume() from your application to re-dispatch a suspended task with new payload data:

err := engine.Resume(ctx, workflowID, taskRunID, map[string]any{
    "approved": true,
    "reviewer": "alice",
})

Resume Signature

func (e *Engine) Resume(
    ctx       context.Context,
    workflowID string,
    taskID     string,
    payload    map[string]any,
) error

Parameter	Description
`workflowID`	The workflow run ID returned by `Engine.Submit()`.
`taskID`	The task run ID of the suspended task (from `Engine.Get()`).
`payload`	Key-value map merged into the task’s accumulated inputs.

Payload Merging

Resume() merges the payload map into the task’s accumulated Inputs using last-writer-wins: keys present in payload overwrite the same keys in the stored inputs; keys absent from payload are left unchanged. This means:

The executor always receives the full merged history of all resume payloads on top of the original resolved inputs.
Multiple resume rounds can build up complex state incrementally.
A resume payload with only a subset of keys does not erase previously accumulated keys.

// First resume — adds "step": "validate"
engine.Resume(ctx, wfID, taskID, map[string]any{"step": "validate"})

// Second resume — overwrites "step", adds "approved": true
engine.Resume(ctx, wfID, taskID, map[string]any{"step": "finalize", "approved": true})

// Executor receives: original inputs + step="finalize" + approved=true

What the Executor Sees on Re-dispatch

After Resume() is called, the engine re-dispatches the task to the broker. The executor receives a TaskAssignment with:

Inputs: the fully merged set (original inputs + all resume payloads)
The same deadline: the original task deadline is not reset by Resume()

The executor can then make a decision:

Return ExecCodeSucceeded to finalize the task and unblock downstream nodes.
Return ExecCodeSuspended again to wait for another round of resume data.
Return ExecCodeFailed or ExecCodeError to signal that the task should not proceed.

This decision logic lives entirely inside the executor — the engine imposes no opinion on how many rounds of suspension are needed.

Hook Integration

Suspension and resumption each fire task-level hooks if configured on the DAG task node:

{
  "name": "approval",
  "template": "suspend-task",
  "hooks": {
    "onStart":   { "template": "hook-task" },
    "onSuspend": { "template": "hook-task" },
    "onResume":  { "template": "hook-task" },
    "onSuccess": { "template": "hook-task" },
    "onExit":    { "template": "hook-task" }
  }
}

Hook firing sequence for a task that suspends once then succeeds:

Task dispatched

onStart fires.

Executor returns ExecCodeSuspended

Task transitions to PhaseSuspended. onSuspend fires.

Engine.Resume() called

Task is re-dispatched. onResume fires.

Executor returns ExecCodeSucceeded

Task transitions to PhaseSucceeded. onSuccess fires, then onExit fires.

Hooks are fire-and-forget — a hook failure does not affect the task’s phase or the workflow’s progression.

Idempotency and Race Conditions

Resume() is safe to call on a task that is no longer suspended. If the task has already reached a terminal state (completed normally, timed out, or been cancelled), Resume() returns nil without taking any action. This makes it safe for callers to send resume signals without strict coordination — a race between a timeout and a concurrent Resume() is resolved by the store’s token-based optimistic lock: only one writer succeeds.

Design Patterns

Human approval gates

Suspend a task after sending a notification (email, Slack, etc.). Resume it when the approver clicks “Approve” in your UI. The workflow holds until the gate is cleared.

External callback integration

Suspend after making an async API call. Register the task run ID as the callback token. Resume from the callback handler when the external system responds.

Multi-round interactive tasks

Keep returning ExecCodeSuspended with partial outputs to accumulate incremental state. Resume multiple times before committing. The task receives the full accumulated history each time.

Timeout-bounded approval

Combine suspend with timeout and continueOn.timeout to auto-approve (or auto-reject) if no human response arrives within a deadline. The downstream task runs either way.

Get Started

Core Concepts

Guides

Extension Points

Suspend and Resume: Human Gates and External Callbacks

How Suspension Works

Partial Outputs Are Accumulated

Triggering Suspension from the Executor

Resuming a Suspended Task

Resume Signature

Payload Merging

What the Executor Sees on Re-dispatch

Hook Integration

Idempotency and Race Conditions

Design Patterns

Human approval gates

External callback integration

Multi-round interactive tasks

Timeout-bounded approval

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Extension Points

Documentation Index

​How Suspension Works

​Partial Outputs Are Accumulated

​Triggering Suspension from the Executor

​Resuming a Suspended Task

​Resume Signature

​Payload Merging

​What the Executor Sees on Re-dispatch

​Hook Integration

​Idempotency and Race Conditions

​Design Patterns

Human approval gates

External callback integration

Multi-round interactive tasks

Timeout-bounded approval

Build docs developers (and LLMs) love

How Suspension Works

Partial Outputs Are Accumulated

Triggering Suspension from the Executor

Resuming a Suspended Task

Resume Signature

Payload Merging

What the Executor Sees on Re-dispatch

Hook Integration

Idempotency and Race Conditions

Design Patterns