Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Armur-Ai/Pentest-Swarm-AI/llms.txt

Use this file to discover all available pages before exploring further.

Pentest Swarm AI is not a pipeline with a fancier name. Most “multi-agent” pentesting tools route work through a fixed sequence — recon feeds classify, classify feeds exploit, exploit feeds report — with a central planner dispatching each step. Pentest Swarm AI replaces that design with a genuine swarm: agents share an environment, each agent’s writes influence every other agent’s behaviour, and the useful attack paths emerge from that shared state rather than from any script that prescribes them.

The Pipeline Problem

A fixed recon → classify → exploit → report pipeline has a structural ceiling. Every path the campaign can take must be anticipated in advance and encoded in the orchestrator. Parallelism is limited to what the author of the pipeline planned for. If recon surfaces an unexpected technology stack, the pipeline can only follow routes already wired in. Partial results from one phase cannot begin feeding the next until the full phase completes. More practically: pipelines are brittle. Adding a new capability means editing the orchestrator. Removing an agent risks breaking the sequencing logic. The whole thing is only as smart as whoever wrote the dispatch code. Swarm intelligence addresses these limits at the architecture level. Because coordination happens through shared state rather than function calls, agents are genuinely independent. New agents can join the swarm by declaring a trigger predicate, with no changes to any existing code.

Three Swarm Primitives

Stigmergy

In biology, stigmergy is coordination through environmental modification — ants lay pheromone trails that other ants follow. Pentest Swarm AI uses the same mechanism. Agents do not talk to each other. They write findings to a shared blackboard, and every finding carries a pheromone weight that biases other agents toward it. When the classifier writes a high-severity CVE_MATCH, its pheromone weight signals the exploit agent that this finding is worth acting on immediately. When a SESSION token ages out, its decayed weight means agents naturally stop spending resources on it. Coordination is an emergent property of writes and reads, not a property of the orchestrator.

Emergence

Attack chains appear that no single agent planned. A recon finding wakes the classifier. A high-severity classification wakes the exploit agent. Exploit results land back on the board and wake the report agent. The sequence is not prescribed — it follows from the pheromone state of the blackboard at any given moment. In practice this means a 1,000-subdomain target can be processed without anyone writing a plan for it. The swarm self-organises around what the blackboard contains.

Decentralization

Each agent is defined by its trigger predicate — a set of conditions on the blackboard that, when satisfied, cause the agent to be dispatched against a matching finding. There is no central planner that routes work. The scheduler is a thin coordinator: it enforces concurrency caps, enforces scope, and listens for shutdown signals. Selection is emergent from trigger rules and pheromone weights. Adding a custom agent is a matter of implementing the Agent interface with a Trigger() predicate and a Handle() function. No orchestrator code changes.

Architecture Diagram

                     YOU
                      |
               pentestswarm scan example.com --swarm
                      |
           ┌──────────▼──────────┐
           │   SEED: TARGET_REG  │
           └──────────┬──────────┘

 ┌────────────────────────────────────────────────────────┐
 │              SHARED BLACKBOARD (pgvector)              │
 │                                                        │
 │   SUBDOMAIN · PORT_OPEN · HTTP_ENDPOINT · TECHNOLOGY   │
 │   CVE_MATCH · MISCONFIGURATION · EXPLOIT_CHAIN         │
 │   EXPLOIT_RESULT · CAMPAIGN_COMPLETE                   │
 │                                                        │
 │   (each finding has a pheromone weight that decays)    │
 └──┬─────────────┬─────────────┬─────────────┬───────────┘
    │             │             │             │
    │ triggers:   │ triggers:   │ triggers:   │ triggers:
    │ TARGET_REG  │ raw recon + │ CVE_MATCH   │ CAMPAIGN_
    │             │ pheromone>  │ pheromone>  │ COMPLETE
    │             │ 0.2         │ 0.5         │
    ▼             ▼             ▼             ▼
┌─────────┐  ┌─────────┐   ┌─────────┐   ┌─────────┐
│  RECON  │  │CLASSIFY │   │ EXPLOIT │   │ REPORT  │
│         │  │         │   │         │   │         │
│ runs 8  │  │ maps    │   │ builds  │   │ queries │
│ tools,  │  │ CVEs,   │   │ attack  │   │ board   │
│ writes  │  │ scores  │   │ chains  │   │ →md/    │
│ per     │  │ CVSS,   │   │ per     │   │ html/   │
│ finding │  │ writes  │   │ finding │   │ json/   │
└─────────┘  └─────────┘   └─────────┘   │ sarif   │
                                          └─────────┘

Two Execution Modes

The default execution mode runs a deterministic 5-phase pipeline: seed → recon → classify → exploit → report. Each phase completes before the next begins.
pentestswarm scan example.com --scope example.com
This mode is marked stable in the feature table and is the battle-tested path for straightforward engagements where predictable sequencing is preferred over emergent behaviour.The runner lives in internal/engine/runner.go. Cleanup is always registered before execution begins, so SIGINT, crashes, and budget exhaustion all trigger reverse-order cleanup.

Key Behaviours

Any agent can be removed, replaced, or added without rewiring the others. Each agent declares its own trigger predicate. The scheduler subscribes each agent to its predicate and dispatches findings — no agent needs to know any other agent exists.
A PORT_OPEN stays hot for 1 hour. A TARGET_REGISTERED stays hot for 24 hours. A SESSION token decays in 15 minutes. Half-lives are config-driven via config/pheromones.yaml and can be overridden per deployment. Stale paths die naturally without any garbage collection logic.
The --scope flag is not bypassable. Every tool adapter routes through scope.ValidateAndLog() before spawning any subprocess. The executor performs a second validation pass. Violations emit a WARN subsystem=scope log event and return an error — they never silently continue.
Every exploit that creates artifacts — files, users, sessions — registers a cleanup entry before touching the target. Cleanup runs on normal exit, SIGINT, and scheduler crash. The cleanup context is detached from the run context so cancellation does not orphan cleanup jobs.
System prompts are cached by default for the recon and classifier agents. Cache-hit metrics are emitted via Usage.CacheHitRate(). This cuts cost and latency on repeated prompts without any code change.

Comparison with the Ecosystem

ToolArchitectureExecutes vs. suggestsMemoryTools wiredMCPSwarm?
Pentest Swarm AIStigmergic blackboardExecutespgvector + pheromones8 ProjectDiscovery + nmap; sqlmap / Burp MCP / Metasploit in roadmapYes✅ real
PentestGPTSingle-agent ReActSuggestsNoneNone nativeNoNo
HackingBuddyGPTSingle-agentExecutesRun logsShell passthroughNoNo
PentAGI4 agents + plannerExecutespgvector40+ via MCP/shellPartialPipeline
ShannonWhite-box + browserExecutesSession stateBrowser DOMNoPipeline
HexStrikeMCP tool wrapperDelegates to client LLMNone (stateless)150+ via MCPYesNo
Pentest-R1RL-tuned LLMExecutesTrajectoryCTF-scopeNoNo

Feature Status

FeatureStatusNotes
Sequential 5-phase runnerstableDefault mode; battle-tested core
Stigmergic swarm scheduleralpha--swarm flag; memory-backed blackboard wired
ProjectDiscovery toolchainstablesubfinder, httpx, nuclei, naabu, katana, dnsx, gau
nmap adapterstableXML parsed; scope-validated
Cleanup registrystableAlways runs on SIGINT / exit / budget-cancel
Claude prompt cachingstableEnabled for recon + classifier by default
--strict LLM modestablePromotes LLM errors to fatal
CVSS v3.1 scoringstableFIRST spec
Postgres blackboard backendbetaMigration shipped; runner uses memory-board for now
MCP serverbetapentestswarm mcp serve
VS Code extensionbetadeploy/vscode/
GitHub Actionbetadeploy/github-action/action.yml with SARIF
Swarm playbooks (5)betaplaybooks/{bug-bounty,external-asm,ci-cd,internal-network,ctf-solver}.yaml
Live dashboardalphaweb/; UI built, wiring to live campaigns in progress
Burp MCP bridgeplannedWave 2
Metasploit / ZAP / sqlmap adaptersplannedWave 2
Fine-tuned Pentest-Swarm modelplannedWave 3 (Pentest-R1 recipe)
Cybench / AutoPenBench benchmarksplannedWave 3

Build docs developers (and LLMs) love