Vision
The BEAM was built for running millions of lightweight, isolated, communicating processes. That’s exactly what an AI agent swarm is. The patterns emerging in tools like Claude Code’s teams feature—where a lead agent spawns specialized workers, coordinates via message passing, tracks tasks with dependencies, and gracefully shuts down completed agents—that’s just OTP.Why the BEAM Is Perfect for Agent Swarms
Concurrency Without Complexity
An AI agent that reads files, searches code, runs shell commands, and calls LLMs is inherently concurrent. On the BEAM, each tool execution is a lightweight process. Parallel tool calls aren’t a threading nightmare—they’re justTask.async_stream.
Fault Tolerance Is Built In
When a shell command hangs or an LLM provider times out, OTP supervisors handle it. A crashed tool doesn’t take down the session. A crashed session doesn’t take down the application. This isn’t defensive coding—it’s how the BEAM works.Process Discovery
Registry provides process discovery. Agents find each other by name, not by PID.
Native Message Passing
GenServer message passing is the native communication primitive. No Redis pub/sub, no HTTP polling, no message broker.Monitors and Links
Handles the “what if an agent crashes?” problem that every other framework handles with retry loops and health checks.Proposed Architecture
Agent Roles
Lead Agent
The existingLoom.Session GenServer becomes the lead agent:
- Receives user input
- Decomposes requests into tasks
- Spawns specialist agents under
DynamicSupervisor - Tracks task dependencies in the decision graph
- Aggregates results and responds to the user
Researcher Agent
Read-only agent for codebase exploration:- Tools:
file_read,file_search,content_search,directory_list - Weak model (Claude Haiku) for cost efficiency
- Spawned in parallel for independent research tasks
- Example: “Find all usages of
Session.send_message”
Architect Agent
Planning agent using a strong model:- Tools:
file_read,file_search,decision_log,decision_query - Strong model (Claude Opus) for complex reasoning
- Generates implementation plans
- Logs decisions to the shared decision graph
- Example: “Design a new authentication system”
Implementer Agent
Code execution agent:- Tools:
file_read,file_write,file_edit,shell,git - Fast model (Claude Sonnet) for execution
- Follows plans from the architect
- Commits changes with explanatory messages
- Example: “Implement the plan for adding email auth”
Tester Agent
Verification agent:- Tools:
shell,file_read,content_search - Weak model for cost efficiency
- Runs tests, analyzes failures, suggests fixes
- Example: “Run mix test and report any failures”
Example Workflow
User request: “Refactor the session module”Step 1: Lead Agent Decomposes Task
Step 2: Researcher Agents Explore in Parallel
Step 3: Architect Creates Plan
Step 4: Implementer Executes Plan
Step 5: Tester Verifies
Step 6: Lead Agent Responds
Shared State: The Decision Graph
All agents read and write to the same decision graph in SQLite. This provides:- Shared memory — All agents see the same goals, decisions, and outcomes
- Coordination — Agents can check what others have decided
- Persistence — The plan survives agent crashes
- Visualization — LiveView renders the entire swarm’s reasoning in real-time
Implementation Plan
Phase 1: Multi-Agent Infrastructure (Current)
- Session GenServer as lead agent
- DynamicSupervisor for spawning sessions
- Registry for process discovery
- Shared decision graph
- Sub-agent tool (read-only researcher)
Phase 2: Specialized Agent Modules
-
Loom.Agents.Researcher— Parallel codebase exploration -
Loom.Agents.Architect— Plan generation with strong model -
Loom.Agents.Implementer— Code execution with fast model -
Loom.Agents.Tester— Test execution and analysis
Phase 3: Coordination Protocol
- Task decomposition in lead agent
- Dependency tracking in decision graph
- Agent-to-agent message passing
- Result aggregation
Phase 4: Swarm UI
- LiveView component showing active agents
- Agent status indicators (thinking, executing, idle)
- Real-time decision graph with agent annotations
- Cost breakdown per agent
Benefits of BEAM-Native Swarms
No External Dependencies
- No message broker (Redis, RabbitMQ)
- No task queue (Celery, Sidekiq)
- No orchestration layer (Kubernetes, Docker Swarm)
Fault Tolerance
Backpressure
Live Introspection
Hot Code Reloading
Update agent behavior without killing sessions:Challenges
Cost Management
Multiple agents = multiple LLM calls. Mitigation:- Use weak models (Haiku) for read-only tasks
- Cache research results in ETS
- Reuse researcher agents across requests
Coordination Overhead
Message passing adds latency. Mitigation:- Run independent tasks in parallel
- Use
Task.async_streamwith backpressure - Batch related research into single agent calls
Debugging
Multiple concurrent agents are harder to debug. Mitigation:- Emit structured Telemetry events per agent
- LiveView shows real-time agent activity
- Decision graph records all agent reasoning
Comparison to Other Approaches
| Framework | Coordination | Fault Tolerance | Observability |
|---|---|---|---|
| Loom (BEAM) | OTP message passing | Supervisors | LiveView + Telemetry |
| Claude Code | HTTP API | Retry loops | Logs |
| LangGraph | Python orchestrator | Try/catch | LangSmith |
| AutoGPT | Sequential executor | None | Print statements |
Next Steps
Multi-agent coding isn’t a feature to bolt on later. On the BEAM, it’s the natural evolution. The primitives are already here:DynamicSupervisormanages agent lifecycleRegistryprovides discoveryGenServerhandles message passingTask.async_streamruns agents in parallel- Phoenix LiveView visualizes the swarm in real-time
- The decision graph provides shared memory
Learn More
- Architecture Deep Dive — Understand Loom’s OTP design
- Contributing — Help build agent swarms
- Jido Documentation — The agent framework powering Loom