Pentest Swarm AI is not a pipeline with a fancier name. Most “multi-agent” pentesting tools route work through a fixed sequence — recon feeds classify, classify feeds exploit, exploit feeds report — with a central planner dispatching each step. Pentest Swarm AI replaces that design with a genuine swarm: agents share an environment, each agent’s writes influence every other agent’s behaviour, and the useful attack paths emerge from that shared state rather than from any script that prescribes them.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Armur-Ai/Pentest-Swarm-AI/llms.txt
Use this file to discover all available pages before exploring further.
The Pipeline Problem
A fixed recon → classify → exploit → report pipeline has a structural ceiling. Every path the campaign can take must be anticipated in advance and encoded in the orchestrator. Parallelism is limited to what the author of the pipeline planned for. If recon surfaces an unexpected technology stack, the pipeline can only follow routes already wired in. Partial results from one phase cannot begin feeding the next until the full phase completes. More practically: pipelines are brittle. Adding a new capability means editing the orchestrator. Removing an agent risks breaking the sequencing logic. The whole thing is only as smart as whoever wrote the dispatch code. Swarm intelligence addresses these limits at the architecture level. Because coordination happens through shared state rather than function calls, agents are genuinely independent. New agents can join the swarm by declaring a trigger predicate, with no changes to any existing code.Three Swarm Primitives
Stigmergy
In biology, stigmergy is coordination through environmental modification — ants lay pheromone trails that other ants follow. Pentest Swarm AI uses the same mechanism. Agents do not talk to each other. They write findings to a shared blackboard, and every finding carries a pheromone weight that biases other agents toward it. When the classifier writes a high-severityCVE_MATCH, its pheromone weight signals the exploit agent that this finding is worth acting on immediately. When a SESSION token ages out, its decayed weight means agents naturally stop spending resources on it.
Coordination is an emergent property of writes and reads, not a property of the orchestrator.
Emergence
Attack chains appear that no single agent planned. A recon finding wakes the classifier. A high-severity classification wakes the exploit agent. Exploit results land back on the board and wake the report agent. The sequence is not prescribed — it follows from the pheromone state of the blackboard at any given moment. In practice this means a 1,000-subdomain target can be processed without anyone writing a plan for it. The swarm self-organises around what the blackboard contains.Decentralization
Each agent is defined by its trigger predicate — a set of conditions on the blackboard that, when satisfied, cause the agent to be dispatched against a matching finding. There is no central planner that routes work. The scheduler is a thin coordinator: it enforces concurrency caps, enforces scope, and listens for shutdown signals. Selection is emergent from trigger rules and pheromone weights. Adding a custom agent is a matter of implementing theAgent interface with a Trigger() predicate and a Handle() function. No orchestrator code changes.
Architecture Diagram
Two Execution Modes
- Sequential Runner (default)
- Stigmergic Swarm (--swarm)
The default execution mode runs a deterministic 5-phase pipeline: seed → recon → classify → exploit → report. Each phase completes before the next begins.This mode is marked stable in the feature table and is the battle-tested path for straightforward engagements where predictable sequencing is preferred over emergent behaviour.The runner lives in
internal/engine/runner.go. Cleanup is always registered before execution begins, so SIGINT, crashes, and budget exhaustion all trigger reverse-order cleanup.Key Behaviours
Agents are independent
Agents are independent
Any agent can be removed, replaced, or added without rewiring the others. Each agent declares its own trigger predicate. The scheduler subscribes each agent to its predicate and dispatches findings — no agent needs to know any other agent exists.
Pheromones decay per finding type
Pheromones decay per finding type
A
PORT_OPEN stays hot for 1 hour. A TARGET_REGISTERED stays hot for 24 hours. A SESSION token decays in 15 minutes. Half-lives are config-driven via config/pheromones.yaml and can be overridden per deployment. Stale paths die naturally without any garbage collection logic.Scope enforced at the tool layer and executor
Scope enforced at the tool layer and executor
The
--scope flag is not bypassable. Every tool adapter routes through scope.ValidateAndLog() before spawning any subprocess. The executor performs a second validation pass. Violations emit a WARN subsystem=scope log event and return an error — they never silently continue.Cleanup always registered before execution
Cleanup always registered before execution
Every exploit that creates artifacts — files, users, sessions — registers a cleanup entry before touching the target. Cleanup runs on normal exit, SIGINT, and scheduler crash. The cleanup context is detached from the run context so cancellation does not orphan cleanup jobs.
Prompt caching on Claude
Prompt caching on Claude
System prompts are cached by default for the recon and classifier agents. Cache-hit metrics are emitted via
Usage.CacheHitRate(). This cuts cost and latency on repeated prompts without any code change.Comparison with the Ecosystem
| Tool | Architecture | Executes vs. suggests | Memory | Tools wired | MCP | Swarm? |
|---|---|---|---|---|---|---|
| Pentest Swarm AI | Stigmergic blackboard | Executes | pgvector + pheromones | 8 ProjectDiscovery + nmap; sqlmap / Burp MCP / Metasploit in roadmap | Yes | ✅ real |
| PentestGPT | Single-agent ReAct | Suggests | None | None native | No | No |
| HackingBuddyGPT | Single-agent | Executes | Run logs | Shell passthrough | No | No |
| PentAGI | 4 agents + planner | Executes | pgvector | 40+ via MCP/shell | Partial | Pipeline |
| Shannon | White-box + browser | Executes | Session state | Browser DOM | No | Pipeline |
| HexStrike | MCP tool wrapper | Delegates to client LLM | None (stateless) | 150+ via MCP | Yes | No |
| Pentest-R1 | RL-tuned LLM | Executes | Trajectory | CTF-scope | No | No |
Feature Status
| Feature | Status | Notes |
|---|---|---|
| Sequential 5-phase runner | stable | Default mode; battle-tested core |
| Stigmergic swarm scheduler | alpha | --swarm flag; memory-backed blackboard wired |
| ProjectDiscovery toolchain | stable | subfinder, httpx, nuclei, naabu, katana, dnsx, gau |
nmap adapter | stable | XML parsed; scope-validated |
| Cleanup registry | stable | Always runs on SIGINT / exit / budget-cancel |
| Claude prompt caching | stable | Enabled for recon + classifier by default |
--strict LLM mode | stable | Promotes LLM errors to fatal |
| CVSS v3.1 scoring | stable | FIRST spec |
| Postgres blackboard backend | beta | Migration shipped; runner uses memory-board for now |
| MCP server | beta | pentestswarm mcp serve |
| VS Code extension | beta | deploy/vscode/ |
| GitHub Action | beta | deploy/github-action/action.yml with SARIF |
| Swarm playbooks (5) | beta | playbooks/{bug-bounty,external-asm,ci-cd,internal-network,ctf-solver}.yaml |
| Live dashboard | alpha | web/; UI built, wiring to live campaigns in progress |
| Burp MCP bridge | planned | Wave 2 |
| Metasploit / ZAP / sqlmap adapters | planned | Wave 2 |
| Fine-tuned Pentest-Swarm model | planned | Wave 3 (Pentest-R1 recipe) |
| Cybench / AutoPenBench benchmarks | planned | Wave 3 |