Pentest Swarm AI is a multi-agent system where every agent reads and writes a shared blackboard. That architecture — many writers, many readers, one piece of shared state — is exactly the structure that memory-injection attacks target. Two recent papers formalize the threat: MINJA (arXiv:2503.03704) shows that an attacker who can write to an agent’s memory can inject payloads that bias downstream reasoning, causing agents to retrieve injected findings as if they were legitimate. MemoryGraft (arXiv:2512.16962) extends this with variants where the attacker plants artefacts that look like high-confidence findings, drowning real signal in noise. Because the swarm’s blackboard is designed to bias agent reasoning toward interesting findings, it is an inherently attractive attack surface — and the output (a vulnerability report submitted to a bug bounty platform) is high-value enough to be worth poisoning.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Armur-Ai/Pentest-Swarm-AI/llms.txt
Use this file to discover all available pages before exploring further.
The Threat Model
The attacker in this model is a process that can write valid JSON to the blackboard — either a compromised agent with a legitimate keypair, a rogue process injected into the same host, or any code path that bypasses upstream validation. From that position, an attacker can:- Inflate pheromone scores — write a finding with an arbitrarily large
PheromoneBaseto dominate every ranked query, forcing the swarm to prioritize a fabricated finding over real ones. - Impersonate a trusted agent — write a finding with
AgentName: "classifier"without holding the classifier’s private key, making other agents believe a trusted source authored the finding. - Flood the board — write hundreds of findings per second to exhaust the scheduler and saturate the LLM provider budget.
- Graft plausible-looking payloads — write structurally valid but semantically wrong findings that slip past type-based filtering (e.g., a
reconagent writingCVE_MATCHfindings).
Layer 1 — Pheromone Clamp
Attack: write a finding withPheromoneBase: 9999 to permanently dominate every ranked query.
Defense: MemoryBoard.Write clamps PheromoneBase to [0, 1] before storage. Before this clamp existed, a single bogus write could sit at the top of every “what’s most interesting” query indefinitely. After the clamp, the attacker must compete in the same [0, 1] band as every legitimate agent, where real signal from genuine findings wins through normal pheromone decay.
The test that first caught this defense gap during development — and now guards against regression in CI — is TestMINJA_PheromoneFloodIsClamped:
TestMINJA_MinPheromoneFiltersStaleInjection, validates that MinPheromone query predicates correctly reject low-confidence injected findings while preserving genuine high-confidence ones.
Layer 2 — Ed25519 Provenance Signing
Attack: write a finding withAgentName: "classifier" from a process that is not the real classifier. Agents that filter on AgentName trust the forged label.
Defense: every write is signed with an Ed25519 keypair held by the originating agent. The signature covers the canonical tuple (campaign | agent | type | target | data | createdUnix). Verifying the signature detects two attack classes simultaneously:
- Tampering: any byte changed in any field after signing causes signature verification to fail.
- Impersonation: signing with a different keypair under a stolen agent name fails against the real agent’s registered public key.
NewSigner() and are bound to the agent name on first write. Subsequent writes from a different keypair under the same agent name fail verification.
Layer 3 — MemoryGraft Heuristic Detector
Provenance and pheromone clamping catch a single malicious write. They do not catch patterns of malicious behavior. A compromised legitimate agent — one that holds a valid keypair — can still write nonsense, and provenance won’t flag it.internal/swarm/memorygraft.Scan reads recent findings and emits alerts for four patterns characteristic of MemoryGraft attacks. The detector is pure-read — it never writes to the board — and is safe to call from a watchdog goroutine on a ticker.
| Pattern | Severity | What it catches | Default threshold |
|---|---|---|---|
burst | medium | An agent writes ≥ N findings inside a short window. Real agents pulse; attackers spam. | 50 writes / 60 s |
repeat-title | low | The same title fingerprint (first 80 bytes of Data) repeats from one agent. Real findings vary in shape. | 5 identical fingerprints |
duplicate-data | high | Byte-identical Data payloads from one agent. Real findings carry different evidence per finding. | 3 identical payloads |
type-mismatch | medium | An agent emits a finding under a type owned by another agent (e.g., recon writing CVE_MATCH). | 1 occurrence |
Config{}:
expectedAuthors map encodes which agent types are allowed to emit each finding type:
Scan is a regression alarm, not a hard security boundary. A determined attacker can use any agent name — provenance (Layer 2) is the trust boundary. Type-mismatch alerts fire when something unexpected happens; they are not a substitute for signature verification.Layer 4 — Per-Agent Rate Limit
Attack: an agent writes findings that wake itself, creating a tight feedback loop that saturates the LLM provider within seconds — even without malicious intent. Defense:ratelimit.New(perSecond, burst float64) builds a token-bucket limiter that the scheduler installs at the dispatch boundary. The scheduler calls lim.Take(ctx) before dispatching each finding to an agent. This is not at the write boundary — any agent can write whatever it wants (subject to layers 1–3). The rate limit only governs how fast the scheduler drains that agent’s queue.
The limiter is implemented without external dependencies — ~50 lines of Go — because golang.org/x/time/rate was considered and rejected to keep go.mod minimal:
Scope Enforcement
Scope validation is a defense-in-depth measure that operates independently of the blackboard security layers. It runs at two points in the execution path:-
Tool layer —
scope.Validate(target, scope)is called before any security tool executes against a host. It accepts IPs,host:portstrings, and full URLs, resolving each to either an allowed CIDR range or an allowed domain (with wildcard and subdomain matching). -
Executor layer —
scope.ValidateCommand(cmd, scope)scans every command string with a regex that extracts all IPs and domain-like strings, then validates each against the scope definition before the command runs. No exceptions.
scope.enforce_strict is hardcoded to true in config.example.yaml and cannot be disabled at runtime. The comment in the config file reads: ALWAYS true — cannot be disabled. Safety constraint. Any PR that attempts to make this configurable will be rejected at review.Cleanup Registry
Every exploitation command is registered in a cleanup registry before it executes. If the campaign ends for any reason — clean exit, SIGINT, budget exhaustion, or crash — pending cleanup actions run in reverse registration order, undoing the most recent actions first. Two implementations are available:CleanupRegistry(pipeline/cleanup.go) — Postgres-backed, durable across restarts. Use in production.MemoryCleanupRegistry(pipeline/cleanup_memory.go) — in-memory, non-durable. Suitable for tests and short-lived runs. Actions are replayed in reverse order onRunCleanup.
Test Coverage
| Defense | Test file | Tests |
|---|---|---|
| Pheromone clamp | internal/swarm/blackboard/injection_test.go | TestMINJA_PheromoneFloodIsClamped, TestMINJA_QueryByTypeIsolatesPayload, TestMINJA_MinPheromoneFiltersStaleInjection |
| Ed25519 provenance | internal/swarm/provenance/provenance_test.go | 4 tests: roundtrip, tamper, impersonation, malformed key/sig |
| MemoryGraft detector | internal/swarm/memorygraft/detector_test.go | 4 tests: burst, duplicate-data, type-mismatch, quiet-board |
| Rate limit | internal/swarm/ratelimit/ratelimit_test.go | 4 tests: zero-rate, burst-then-throttle, ctx-cancel, nil-safe |
What Was NOT Done and Why
Cross-agent communication encryption
Cross-agent communication encryption
All agent traffic is local-process. The threat model is misbehaving code on the same host, not a network adversary. Encrypting local-process communication is theatre with no security benefit against the actual attack class.
Anomaly-detection ML
Anomaly-detection ML
Conservative heuristics catch the 80% of attacks worth catching at zero false-positive cost. ML adds opacity that is hard to justify in a security review, adds a model dependency, and produces false positives that degrade the operator experience more than the attacks themselves.
Full audit log signing
Full audit log signing
Every write already carries an Ed25519 signature stored alongside the finding. Signing the log of writes would be double-counting and would add an HSM dependency. The per-write signatures are the audit trail.
References
- MINJA — arXiv:2503.03704 — memory injection against multi-agent systems via planted high-confidence payloads
- MemoryGraft — arXiv:2512.16962 — variants that graft plausible-looking artefacts onto shared agent memory
- Dark Side of LLMs — arXiv:2507.06850
Deployment
Docker Compose setup, database migrations, and production configuration
Legal Notice
Authorization requirements, AGPL-3.0 license, and responsible disclosure