Skip to main content

Lifecycle Overview

A complete Symphony workflow follows this lifecycle:

Phase 1: Issue Polling

Poll Tick Trigger

The orchestrator schedules recurring ticks at polling.interval_ms (default: 30 seconds).
# Initial tick scheduled at startup
def init(_opts) do
  state = %State{
    poll_interval_ms: Config.poll_interval_ms(),
    # ...
  }
  
  run_terminal_workspace_cleanup()
  :ok = schedule_tick(0)  # Immediate first poll
  {:ok, state}
end

# Subsequent ticks
def handle_info(:tick, state) do
  state = refresh_runtime_config(state)  # Hot-reload WORKFLOW.md
  state = %{state | poll_check_in_progress: true}
  schedule_poll_cycle_start()  # Short delay for dashboard render
  {:noreply, state}
end
Reference: elixir/lib/symphony_elixir/orchestrator.ex:50-76
The orchestrator hot-reloads configuration on every tick, allowing changes to WORKFLOW.md to take effect without restart.

Startup Terminal Cleanup

Before the first poll, Symphony cleans up workspaces for issues already in terminal states:
defp run_terminal_workspace_cleanup do
  case Tracker.fetch_issues_by_states(Config.linear_terminal_states()) do
    {:ok, issues} ->
      Enum.each(issues, fn
        %Issue{identifier: identifier} when is_binary(identifier) ->
          cleanup_issue_workspace(identifier)
      end)
    
    {:error, reason} ->
      Logger.warning("Skipping startup cleanup; failed to fetch terminal issues: #{inspect(reason)}")
  end
end
Reference: elixir/lib/symphony_elixir/orchestrator.ex:776-791

Phase 2: Reconciliation

Before dispatching new work, the orchestrator reconciles all running issues.

Step 2a: Stall Detection

Stall detection prevents zombie sessions that stop emitting events but don’t exit.
defp reconcile_stalled_running_issues(%State{} = state) do
  timeout_ms = Config.codex_stall_timeout_ms()  # Default: 300000 (5 min)
  
  cond do
    timeout_ms <= 0 -> state  # Disabled
    map_size(state.running) == 0 -> state
    true ->
      now = DateTime.utc_now()
      Enum.reduce(state.running, state, fn {issue_id, running_entry}, state_acc ->
        elapsed_ms = stall_elapsed_ms(running_entry, now)
        
        if is_integer(elapsed_ms) and elapsed_ms > timeout_ms do
          Logger.warning("Issue stalled: #{issue_id} elapsed_ms=#{elapsed_ms}; restarting")
          
          state_acc
          |> terminate_running_issue(issue_id, false)
          |> schedule_issue_retry(issue_id, next_attempt, %{
            identifier: running_entry.identifier,
            error: "stalled for #{elapsed_ms}ms without codex activity"
          })
        else
          state_acc
        end
      end)
  end
end
Reference: elixir/lib/symphony_elixir/orchestrator.ex:367-406 Elapsed time calculation:
elapsed_ms = now - (last_codex_timestamp || started_at)
If last_codex_timestamp exists (any event received), use it. Otherwise use started_at (worker launch time).

Step 2b: Tracker State Refresh

Active-run reconciliation ensures running sessions stay aligned with current tracker state.
defp reconcile_running_issues(%State{} = state) do
  running_ids = Map.keys(state.running)
  
  case Tracker.fetch_issue_states_by_ids(running_ids) do
    {:ok, issues} ->
      Enum.reduce(issues, state, fn issue, state_acc ->
        cond do
          terminal_issue_state?(issue.state) ->
            Logger.info("Issue #{issue.identifier} moved to terminal state=#{issue.state}; stopping")
            terminate_running_issue(state_acc, issue.id, true)  # cleanup_workspace=true
          
          !issue_routable_to_worker?(issue) ->
            Logger.info("Issue #{issue.identifier} no longer routed; stopping")
            terminate_running_issue(state_acc, issue.id, false)
          
          active_issue_state?(issue.state) ->
            refresh_running_issue_state(state_acc, issue)  # Update in-memory snapshot
          
          true ->
            Logger.info("Issue #{issue.identifier} moved to non-active state; stopping")
            terminate_running_issue(state_acc, issue.id, false)
        end
      end)
    
    {:error, reason} ->
      Logger.debug("Failed to refresh states: #{inspect(reason)}; keeping workers running")
      state
  end
end
Reference: elixir/lib/symphony_elixir/orchestrator.ex:236-324 Reconciliation outcomes:

Terminal State

Terminate worker + clean workspace

Still Active

Update issue snapshot in memory

Non-Active

Terminate worker (no cleanup)

Phase 3: Validation

Before fetching candidates, the orchestrator validates runtime configuration:
defp maybe_dispatch(%State{} = state) do
  state = reconcile_running_issues(state)
  
  with :ok <- Config.validate!(),
       {:ok, issues} <- Tracker.fetch_candidate_issues(),
       true <- available_slots(state) > 0 do
    choose_issues(issues, state)
  else
    {:error, :missing_linear_api_token} ->
      Logger.error("Linear API token missing in WORKFLOW.md")
      state
    
    {:error, :missing_codex_command} ->
      Logger.error("Codex command missing in WORKFLOW.md")
      state
    
    # ... other validation errors
  end
end
Reference: elixir/lib/symphony_elixir/orchestrator.ex:173-234 Validation checks:
  • tracker.kind present and supported
  • tracker.api_key present after $ resolution
  • tracker.project_slug present (for Linear)
  • codex.command present and non-empty
If validation fails, dispatch is skipped for this tick, but reconciliation continues and the next tick will retry.

Phase 4: Candidate Fetch

Tracker Query

The orchestrator fetches candidate issues in active states:
query CandidateIssues($projectSlug: String!, $states: [String!]!) {
  issues(
    filter: {
      project: { slugId: { eq: $projectSlug } }
      state: { name: { in: $states } }
    }
    first: 50
  ) {
    nodes {
      id
      identifier
      title
      description
      priority
      state { name }
      # ...
    }
  }
}
Variables:
  • projectSlug: tracker.project_slug
  • states: tracker.active_states

Issue Normalization

Raw tracker responses are normalized:
  • labels → lowercase strings
  • blocked_by → derived from inverse blocks relations
  • priority → integer only (non-integers become nil)
  • created_at, updated_at → parsed ISO-8601 timestamps
  • state → compared after trim + lowercase

Phase 5: Dispatch

Issue Sorting

Candidates are sorted by dispatch priority:
Enum.sort_by(issues, fn issue ->
  {
    priority_rank(issue.priority),      # 1..4 → 1..4, null → 5
    issue_created_at_sort_key(issue),   # Unix microseconds (oldest first)
    issue.identifier || issue.id        # Lexicographic tie-breaker
  }
end)
Reference: elixir/lib/symphony_elixir/orchestrator.ex:453-461

Eligibility Checks

For each issue in sorted order:
defp should_dispatch_issue?(issue, state, active_states, terminal_states) do
  candidate_issue?(issue, active_states, terminal_states) and
    !todo_issue_blocked_by_non_terminal?(issue, terminal_states) and
    !MapSet.member?(state.claimed, issue.id) and
    !Map.has_key?(state.running, issue.id) and
    available_slots(state) > 0 and
    state_slots_available?(issue, state.running)
end
Candidate checks:
  1. Has required fields (id, identifier, title, state)
  2. State in active_states and not in terminal_states
  3. Routable to worker (assignee check)
Blocker rule:
  • If issue state is “Todo”, reject if any blocker is non-terminal
Concurrency checks:
  1. Global: max_concurrent_agents - running_count > 0
  2. Per-state: max_concurrent_agents_by_state[state] - running_count_for_state > 0
Reference: elixir/lib/symphony_elixir/orchestrator.ex:473-507

Issue Revalidation

Before spawning a worker, the orchestrator refreshes the issue from the tracker to avoid acting on stale data:
defp dispatch_issue(state, issue, attempt) do
  case revalidate_issue_for_dispatch(issue, &Tracker.fetch_issue_states_by_ids/1, terminal_states) do
    {:ok, refreshed_issue} ->
      do_dispatch_issue(state, refreshed_issue, attempt)
    
    {:skip, :missing} ->
      Logger.info("Skipping; issue no longer visible: #{issue.identifier}")
      state
    
    {:skip, refreshed_issue} ->
      Logger.info("Skipping stale dispatch: #{refreshed_issue.identifier} state=#{refreshed_issue.state}")
      state
    
    {:error, reason} ->
      Logger.warning("Skipping; refresh failed: #{inspect(reason)}")
      state
  end
end
Reference: elixir/lib/symphony_elixir/orchestrator.ex:578-596

Worker Spawn

defp do_dispatch_issue(state, issue, attempt) do
  recipient = self()
  
  case Task.Supervisor.start_child(SymphonyElixir.TaskSupervisor, fn ->
    AgentRunner.run(issue, recipient, attempt: attempt)
  end) do
    {:ok, pid} ->
      ref = Process.monitor(pid)
      Logger.info("Dispatching #{issue.identifier} to agent pid=#{inspect(pid)} attempt=#{inspect(attempt)}")
      
      running_entry = %{
        pid: pid,
        ref: ref,
        identifier: issue.identifier,
        issue: issue,
        started_at: DateTime.utc_now(),
        retry_attempt: normalize_retry_attempt(attempt),
        # ... session tracking fields
      }
      
      %{
        state |
        running: Map.put(state.running, issue.id, running_entry),
        claimed: MapSet.put(state.claimed, issue.id),
        retry_attempts: Map.delete(state.retry_attempts, issue.id)
      }
  end
end
Reference: elixir/lib/symphony_elixir/orchestrator.ex:598-647

Phase 6: Workspace Creation

The Agent Runner creates an isolated workspace:
def run(issue, recipient, opts) do
  case Workspace.create_for_issue(issue) do
    {:ok, workspace} ->
      try do
        with :ok <- Workspace.run_before_run_hook(workspace, issue),
             :ok <- run_codex_turns(workspace, issue, recipient, opts) do
          :ok
        end
      after
        Workspace.run_after_run_hook(workspace, issue)
      end
  end
end
Reference: elixir/lib/symphony_elixir/agent_runner.ex:11-33

Workspace Path Construction

1. Sanitize identifier: "ABC-123" → "ABC-123" (already safe)
                        "MT/649"  → "MT_649" (slash replaced)
                        
2. Join with root: workspace_root + "/" + sanitized_identifier
   Example: "/tmp/symphony_workspaces/ABC-123"

3. Validate path safety:
   - Must be inside workspace_root (prefix check)
   - Must not equal workspace_root
   - Must not contain symlink escapes

Directory Creation

defp ensure_workspace(workspace) do
  cond do
    File.dir?(workspace) ->
      clean_tmp_artifacts(workspace)  # Remove .elixir_ls, tmp/
      {:ok, false}  # Reused
    
    File.exists?(workspace) ->
      File.rm_rf!(workspace)
      create_workspace(workspace)
    
    true ->
      create_workspace(workspace)
  end
end

defp create_workspace(workspace) do
  File.rm_rf!(workspace)
  File.mkdir_p!(workspace)
  {:ok, true}  # Newly created
end
Reference: elixir/lib/symphony_elixir/workspace.ex:32-51

after_create Hook

If the workspace was newly created (not reused), run hooks.after_create:
defp maybe_run_after_create_hook(workspace, issue_context, created?) do
  case created? do
    true ->
      case Config.workspace_hooks()[:after_create] do
        nil -> :ok
        command -> run_hook(command, workspace, issue_context, "after_create")
      end
    false ->
      :ok
  end
end
Reference: elixir/lib/symphony_elixir/workspace.ex:125-139
after_create hook failure is fatal to workspace creation. The workspace will not be used.

Phase 7: Agent Execution

Codex Session Startup

defp run_codex_turns(workspace, issue, recipient, opts) do
  with {:ok, session} <- AppServer.start_session(workspace) do
    try do
      do_run_codex_turns(session, workspace, issue, recipient, opts, fetcher, 1, max_turns)
    after
      AppServer.stop_session(session)
    end
  end
end
Reference: elixir/lib/symphony_elixir/agent_runner.ex:49-60 Protocol handshake:
{"id":1,"method":"initialize","params":{"clientInfo":{"name":"symphony","version":"1.0"},"capabilities":{}}}
{"method":"initialized","params":{}}
{"id":2,"method":"thread/start","params":{"approvalPolicy":"...","sandbox":"workspace-write","cwd":"/abs/workspace"}}
Reference: SPEC.md:928-936

First Turn

Prompt rendering:
prompt = PromptBuilder.build_prompt(issue, opts)
# Uses WORKFLOW.md template + issue data
Turn start:
{"id":3,"method":"turn/start","params":{
  "threadId":"<thread-id>",
  "input":[{"type":"text","text":"<rendered-prompt>"}],
  "cwd":"/abs/workspace",
  "title":"ABC-123: Example Issue",
  "approvalPolicy":"...",
  "sandboxPolicy":{"type":"..."}
}}
Reference: SPEC.md:954-963 Event streaming: Codex emits line-delimited JSON on stdout:
  • turn/completed → success
  • turn/failed → failure
  • turn/cancelled → failure
  • Tool calls → handled by dynamic tool executor
  • Approval requests → auto-approved or failed (depending on policy)

Continuation Turns

If the turn completes successfully and the issue is still active:
case continue_with_issue?(issue, issue_state_fetcher) do
  {:continue, refreshed_issue} when turn_number < max_turns ->
    Logger.info("Continuing after normal turn completion turn=#{turn_number}/#{max_turns}")
    do_run_codex_turns(session, workspace, refreshed_issue, recipient, opts, fetcher, turn_number + 1, max_turns)
  
  {:continue, _} ->
    Logger.info("Reached max_turns with issue still active; returning to orchestrator")
    :ok
  
  {:done, _} ->
    :ok
end
Reference: elixir/lib/symphony_elixir/agent_runner.ex:74-99 Continuation prompt:
Continuation guidance:

- The previous Codex turn completed normally, but the Linear issue is still active.
- This is continuation turn #2 of 20.
- Resume from current workspace state instead of restarting.
- The original task instructions are already in this thread.
- Focus on remaining ticket work.
Reference: elixir/lib/symphony_elixir/agent_runner.ex:105-115
Continuation turns reuse the same Codex thread to preserve context and workspace state across multiple turns.

Phase 8: Completion

The worker task exits and reports to the orchestrator.

Normal Exit

def handle_info({:DOWN, ref, :process, _pid, :normal}, state) do
  case find_issue_id_for_ref(state.running, ref) do
    issue_id ->
      {running_entry, state} = pop_running_entry(state, issue_id)
      state = record_session_completion_totals(state, running_entry)
      
      Logger.info("Agent task completed for #{issue_id}; scheduling continuation check")
      
      state
      |> complete_issue(issue_id)
      |> schedule_issue_retry(issue_id, 1, %{
        identifier: running_entry.identifier,
        delay_type: :continuation  # 1-second retry
      })
  end
end
Reference: elixir/lib/symphony_elixir/orchestrator.ex:91-131
Even on normal exit, the orchestrator schedules a continuation retry with a 1-second delay to re-check if the issue is still active.

Abnormal Exit

def handle_info({:DOWN, ref, :process, _pid, reason}, state) when reason != :normal do
  Logger.warning("Agent task exited for #{issue_id} reason=#{inspect(reason)}; scheduling retry")
  
  next_attempt = next_retry_attempt_from_running(running_entry)
  
  schedule_issue_retry(state, issue_id, next_attempt, %{
    identifier: running_entry.identifier,
    error: "agent exited: #{inspect(reason)}"
  })
end
Reference: elixir/lib/symphony_elixir/orchestrator.ex:116-125 Exponential backoff:
attempt 1: delay = min(10000 * 2^0, 300000) = 10s
attempt 2: delay = min(10000 * 2^1, 300000) = 20s
attempt 3: delay = min(10000 * 2^2, 300000) = 40s
...
attempt 6: delay = min(10000 * 2^5, 300000) = 300s (capped)

Phase 9: Retry Handling

Retry Timer

defp schedule_issue_retry(state, issue_id, attempt, metadata) do
  delay_ms = retry_delay(attempt, metadata)
  timer_ref = Process.send_after(self(), {:retry_issue, issue_id}, delay_ms)
  
  Logger.warning("Retrying #{issue_id} in #{delay_ms}ms (attempt #{attempt})")
  
  %{
    state |
    retry_attempts: Map.put(state.retry_attempts, issue_id, %{
      attempt: attempt,
      timer_ref: timer_ref,
      due_at_ms: System.monotonic_time(:millisecond) + delay_ms,
      identifier: metadata[:identifier],
      error: metadata[:error]
    })
  }
end
Reference: elixir/lib/symphony_elixir/orchestrator.ex:677-708

Retry Execution

def handle_info({:retry_issue, issue_id}, state) do
  case Tracker.fetch_candidate_issues() do
    {:ok, issues} ->
      case find_issue_by_id(issues, issue_id) do
        %Issue{} = issue ->
          cond do
            terminal_issue_state?(issue.state) ->
              cleanup_issue_workspace(issue.identifier)
              release_issue_claim(state, issue_id)
            
            retry_candidate_issue?(issue) and slots_available?(issue, state) ->
              dispatch_issue(state, issue, attempt)
            
            retry_candidate_issue?(issue) ->
              schedule_issue_retry(state, issue_id, attempt + 1, %{
                error: "no available orchestrator slots"
              })
            
            true ->
              release_issue_claim(state, issue_id)
          end
        
        nil ->
          release_issue_claim(state, issue_id)
      end
  end
end
Reference: elixir/lib/symphony_elixir/orchestrator.ex:157-768

Phase 10: Cleanup

Terminal State Cleanup

When an issue moves to a terminal state during reconciliation:
terminal_issue_state?(issue.state) ->
  Logger.info("Issue moved to terminal state=#{issue.state}; stopping active agent")
  terminate_running_issue(state, issue.id, true)  # cleanup_workspace=true
Reference: elixir/lib/symphony_elixir/orchestrator.ex:303-306 Cleanup steps:
  1. before_remove hook (if workspace exists)
    case Config.workspace_hooks()[:before_remove] do
      nil -> :ok
      command -> run_hook(command, workspace, issue_context, "before_remove")
    end
    
    Failure is logged and ignored.
  2. Workspace deletion
    File.rm_rf(workspace)
    
  3. State cleanup
    %{
      state |
      running: Map.delete(state.running, issue_id),
      claimed: MapSet.delete(state.claimed, issue_id),
      retry_attempts: Map.delete(state.retry_attempts, issue_id)
    }
    
Reference: elixir/lib/symphony_elixir/orchestrator.ex:335-365

Workspace Preservation

Successful runs do not auto-delete workspaces. Workspaces are reused across runs for the same issue until the issue reaches a terminal state.
This allows:
  • Incremental progress across multiple agent sessions
  • Manual inspection of workspace state between runs
  • Operator-driven workspace cleanup via before_remove hook

Lifecycle Summary

Polling

30s ticks, hot-reload config, reconcile before dispatch

Reconciliation

Stall detection + tracker state refresh for running issues

Dispatch

Sort by priority, validate eligibility, spawn worker task

Workspace

Create/reuse directory, run hooks, enforce safety invariants

Execution

Multi-turn Codex sessions, continuation guidance, event streaming

Retry

Exponential backoff for failures, 1s delay for continuations

Cleanup

Terminal state → run before_remove hook → delete workspace

Next Steps

Component Reference

Implementation details for each component

Workspace Isolation

Safety mechanisms and lifecycle hooks

Build docs developers (and LLMs) love