Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Armur-Ai/Pentest-Swarm-AI/llms.txt

Use this file to discover all available pages before exploring further.

Pentest Swarm AI is a multi-agent system where every agent reads and writes a shared blackboard. That architecture — many writers, many readers, one piece of shared state — is exactly the structure that memory-injection attacks target. Two recent papers formalize the threat: MINJA (arXiv:2503.03704) shows that an attacker who can write to an agent’s memory can inject payloads that bias downstream reasoning, causing agents to retrieve injected findings as if they were legitimate. MemoryGraft (arXiv:2512.16962) extends this with variants where the attacker plants artefacts that look like high-confidence findings, drowning real signal in noise. Because the swarm’s blackboard is designed to bias agent reasoning toward interesting findings, it is an inherently attractive attack surface — and the output (a vulnerability report submitted to a bug bounty platform) is high-value enough to be worth poisoning.

The Threat Model

The attacker in this model is a process that can write valid JSON to the blackboard — either a compromised agent with a legitimate keypair, a rogue process injected into the same host, or any code path that bypasses upstream validation. From that position, an attacker can:
  • Inflate pheromone scores — write a finding with an arbitrarily large PheromoneBase to dominate every ranked query, forcing the swarm to prioritize a fabricated finding over real ones.
  • Impersonate a trusted agent — write a finding with AgentName: "classifier" without holding the classifier’s private key, making other agents believe a trusted source authored the finding.
  • Flood the board — write hundreds of findings per second to exhaust the scheduler and saturate the LLM provider budget.
  • Graft plausible-looking payloads — write structurally valid but semantically wrong findings that slip past type-based filtering (e.g., a recon agent writing CVE_MATCH findings).
The four defense layers below each close a specific attack class.

Layer 1 — Pheromone Clamp

Attack: write a finding with PheromoneBase: 9999 to permanently dominate every ranked query. Defense: MemoryBoard.Write clamps PheromoneBase to [0, 1] before storage. Before this clamp existed, a single bogus write could sit at the top of every “what’s most interesting” query indefinitely. After the clamp, the attacker must compete in the same [0, 1] band as every legitimate agent, where real signal from genuine findings wins through normal pheromone decay. The test that first caught this defense gap during development — and now guards against regression in CI — is TestMINJA_PheromoneFloodIsClamped:
// internal/swarm/blackboard/injection_test.go
func TestMINJA_PheromoneFloodIsClamped(t *testing.T) {
    b := NewMemoryBoard(nil)
    ctx := context.Background()
    camp := uuid.New()

    // Attacker tries to set pheromone way above the legitimate range to
    // dominate ranking. Either the board must clamp to 1.0 or refuse.
    id, _ := b.Write(ctx, Finding{
        CampaignID:    camp,
        AgentName:     "recon",
        Type:          TypeCVEMatch,
        Target:        "victim",
        PheromoneBase: 9999.0,
        HalfLifeSec:   3600,
    })
    results, _ := b.Query(ctx, Predicate{Types: []FindingType{TypeCVEMatch}})
    var got float64
    for _, r := range results {
        if r.ID == id {
            got = r.Pheromone
        }
    }
    if got > 1.0 {
        t.Errorf("pheromone-flood not clamped: got %f, want ≤ 1.0", got)
    }
}
A related test, TestMINJA_MinPheromoneFiltersStaleInjection, validates that MinPheromone query predicates correctly reject low-confidence injected findings while preserving genuine high-confidence ones.

Layer 2 — Ed25519 Provenance Signing

Attack: write a finding with AgentName: "classifier" from a process that is not the real classifier. Agents that filter on AgentName trust the forged label. Defense: every write is signed with an Ed25519 keypair held by the originating agent. The signature covers the canonical tuple (campaign | agent | type | target | data | createdUnix). Verifying the signature detects two attack classes simultaneously:
  • Tampering: any byte changed in any field after signing causes signature verification to fail.
  • Impersonation: signing with a different keypair under a stolen agent name fails against the real agent’s registered public key.
Keys are generated per-process at startup via NewSigner() and are bound to the agent name on first write. Subsequent writes from a different keypair under the same agent name fail verification.
// internal/swarm/provenance/provenance.go

// NewSigner generates a fresh keypair. Call once per agent per process.
func NewSigner() (*Signer, error) {
    pub, priv, err := ed25519.GenerateKey(rand.Reader)
    if err != nil {
        return nil, fmt.Errorf("generate ed25519 key: %w", err)
    }
    return &Signer{priv: priv, pub: pub}, nil
}

// Sign produces a signature over the canonical bytes of a finding.
func (s *Signer) Sign(campaignID, agentName, findingType, target string,
    data []byte, createdUnix int64) []byte {
    msg := canonicalBytes(campaignID, agentName, findingType, target, data, createdUnix)
    return ed25519.Sign(s.priv, msg)
}

// Verify checks a signature against the public key + canonical bytes.
// Returns nil if the signature is valid; an error otherwise.
func Verify(pub []byte, sig []byte, campaignID, agentName, findingType,
    target string, data []byte, createdUnix int64) error {
    if len(pub) != ed25519.PublicKeySize {
        return errors.New("provenance: bad public key length")
    }
    if len(sig) != ed25519.SignatureSize {
        return errors.New("provenance: bad signature length")
    }
    msg := canonicalBytes(campaignID, agentName, findingType, target, data, createdUnix)
    if !ed25519.Verify(ed25519.PublicKey(pub), msg, sig) {
        return errors.New("provenance: signature verification failed (possible tamper)")
    }
    return nil
}
The canonical message uses JSON encoding over a fixed struct rather than raw string concatenation — colon-separated strings are footguns when any field legitimately contains a colon:
func canonicalBytes(campaignID, agentName, findingType, target string,
    data []byte, createdUnix int64) []byte {
    v := struct {
        C string `json:"c"` // campaign id
        A string `json:"a"` // agent name
        T string `json:"t"` // finding type
        G string `json:"g"` // tarGet
        D []byte `json:"d"` // data
        U int64  `json:"u"` // unix-ts
    }{C: campaignID, A: agentName, T: findingType, G: target, D: data, U: createdUnix}
    out, _ := json.Marshal(v)
    return out
}

Layer 3 — MemoryGraft Heuristic Detector

Provenance and pheromone clamping catch a single malicious write. They do not catch patterns of malicious behavior. A compromised legitimate agent — one that holds a valid keypair — can still write nonsense, and provenance won’t flag it. internal/swarm/memorygraft.Scan reads recent findings and emits alerts for four patterns characteristic of MemoryGraft attacks. The detector is pure-read — it never writes to the board — and is safe to call from a watchdog goroutine on a ticker.
PatternSeverityWhat it catchesDefault threshold
burstmediumAn agent writes ≥ N findings inside a short window. Real agents pulse; attackers spam.50 writes / 60 s
repeat-titlelowThe same title fingerprint (first 80 bytes of Data) repeats from one agent. Real findings vary in shape.5 identical fingerprints
duplicate-datahighByte-identical Data payloads from one agent. Real findings carry different evidence per finding.3 identical payloads
type-mismatchmediumAn agent emits a finding under a type owned by another agent (e.g., recon writing CVE_MATCH).1 occurrence
The thresholds are deliberately conservative. False positives drown signal worse than missed catches — the detector prefers letting subtle attacks slide over spamming the operator with normal swarm noise. Operators can tighten all thresholds via Config{}:
// internal/swarm/memorygraft/detector.go

// Config tunes the detector. Zero-values pick conservative defaults.
type Config struct {
    BurstWindow            time.Duration // default 60s
    BurstThreshold         int           // default 50
    RepeatTitleThreshold   int           // default 5
    DuplicateDataThreshold int           // default 3
}

// Alert describes one suspicious pattern detected on the blackboard.
type Alert struct {
    Kind        string    // "burst" | "repeat-title" | "duplicate-data" | "type-mismatch"
    AgentName   string
    Description string
    Severity    string    // "low" | "medium" | "high"
    FirstSeen   time.Time
    Count       int
}
The expectedAuthors map encodes which agent types are allowed to emit each finding type:
var expectedAuthors = map[blackboard.FindingType]map[string]bool{
    blackboard.TypeCVEMatch:      {"classifier": true},
    blackboard.TypeMisconfig:     {"classifier": true},
    blackboard.TypeExploitChain:  {"exploit": true},
    blackboard.TypeExploitResult: {"exploit": true},
    blackboard.TypePortOpen:      {"recon": true, "nmap": true},
    blackboard.TypeSubdomain:     {"recon": true},
}
Scan is a regression alarm, not a hard security boundary. A determined attacker can use any agent name — provenance (Layer 2) is the trust boundary. Type-mismatch alerts fire when something unexpected happens; they are not a substitute for signature verification.

Layer 4 — Per-Agent Rate Limit

Attack: an agent writes findings that wake itself, creating a tight feedback loop that saturates the LLM provider within seconds — even without malicious intent. Defense: ratelimit.New(perSecond, burst float64) builds a token-bucket limiter that the scheduler installs at the dispatch boundary. The scheduler calls lim.Take(ctx) before dispatching each finding to an agent. This is not at the write boundary — any agent can write whatever it wants (subject to layers 1–3). The rate limit only governs how fast the scheduler drains that agent’s queue. The limiter is implemented without external dependencies — ~50 lines of Go — because golang.org/x/time/rate was considered and rejected to keep go.mod minimal:
// internal/swarm/ratelimit/ratelimit.go

// New builds a limiter that fills at perSecond tokens/second,
// with a bucket cap of burst tokens.
func New(perSecond, burst float64) *Limiter {
    if burst <= 0 {
        burst = perSecond
    }
    return &Limiter{
        rate:   perSecond,
        burst:  burst,
        tokens: burst,
        last:   time.Now(),
    }
}

// Take blocks until one token is available (or ctx is done).
func (l *Limiter) Take(ctx context.Context) error {
    if l == nil || l.rate <= 0 {
        return nil
    }
    for {
        l.mu.Lock()
        now := time.Now()
        elapsed := now.Sub(l.last).Seconds()
        l.tokens += elapsed * l.rate
        if l.tokens > l.burst {
            l.tokens = l.burst
        }
        l.last = now
        if l.tokens >= 1 {
            l.tokens--
            l.mu.Unlock()
            return nil
        }
        need := 1 - l.tokens
        wait := time.Duration(need / l.rate * float64(time.Second))
        l.mu.Unlock()
        select {
        case <-ctx.Done():
            return ctx.Err()
        case <-time.After(wait):
        }
    }
}

Scope Enforcement

Scope validation is a defense-in-depth measure that operates independently of the blackboard security layers. It runs at two points in the execution path:
  1. Tool layerscope.Validate(target, scope) is called before any security tool executes against a host. It accepts IPs, host:port strings, and full URLs, resolving each to either an allowed CIDR range or an allowed domain (with wildcard and subdomain matching).
  2. Executor layerscope.ValidateCommand(cmd, scope) scans every command string with a regex that extracts all IPs and domain-like strings, then validates each against the scope definition before the command runs. No exceptions.
// internal/scope/validator.go

// ValidateCommand extracts all IPs and domains from a command string and
// validates each against the scope. Returns ErrScopeViolation if any
// target is out of scope. Called before every command execution.
func ValidateCommand(cmd string, scope ScopeDefinition) error {
    matches := ipAndDomainPattern.FindAllString(cmd, -1)
    for _, match := range matches {
        if isCommonNonTarget(match) {
            continue
        }
        if err := Validate(match, scope); err != nil {
            return fmt.Errorf("command contains out-of-scope target: %w", err)
        }
    }
    return nil
}
scope.enforce_strict is hardcoded to true in config.example.yaml and cannot be disabled at runtime. The comment in the config file reads: ALWAYS true — cannot be disabled. Safety constraint. Any PR that attempts to make this configurable will be rejected at review.

Cleanup Registry

Every exploitation command is registered in a cleanup registry before it executes. If the campaign ends for any reason — clean exit, SIGINT, budget exhaustion, or crash — pending cleanup actions run in reverse registration order, undoing the most recent actions first. Two implementations are available:
  • CleanupRegistry (pipeline/cleanup.go) — Postgres-backed, durable across restarts. Use in production.
  • MemoryCleanupRegistry (pipeline/cleanup_memory.go) — in-memory, non-durable. Suitable for tests and short-lived runs. Actions are replayed in reverse order on RunCleanup.
// internal/pipeline/cleanup_memory.go

// RunCleanup executes all pending cleanup actions for a campaign in reverse order.
func (m *MemoryCleanupRegistry) RunCleanup(ctx context.Context, campaignID uuid.UUID) *CleanupReport {
    m.mu.Lock()
    actions := m.actions[campaignID]
    delete(m.actions, campaignID)
    m.mu.Unlock()

    report := &CleanupReport{CampaignID: campaignID, TotalCount: len(actions)}
    for i := len(actions) - 1; i >= 0; i-- {
        // ... execute action[i], append to report.Executed or report.Failed
    }
    return report
}

Test Coverage

DefenseTest fileTests
Pheromone clampinternal/swarm/blackboard/injection_test.goTestMINJA_PheromoneFloodIsClamped, TestMINJA_QueryByTypeIsolatesPayload, TestMINJA_MinPheromoneFiltersStaleInjection
Ed25519 provenanceinternal/swarm/provenance/provenance_test.go4 tests: roundtrip, tamper, impersonation, malformed key/sig
MemoryGraft detectorinternal/swarm/memorygraft/detector_test.go4 tests: burst, duplicate-data, type-mismatch, quiet-board
Rate limitinternal/swarm/ratelimit/ratelimit_test.go4 tests: zero-rate, burst-then-throttle, ctx-cancel, nil-safe
CI fails if any of these regress. The MINJA pheromone-flood test in particular was the test that first exposed the original defense gap during development.

What Was NOT Done and Why

All agent traffic is local-process. The threat model is misbehaving code on the same host, not a network adversary. Encrypting local-process communication is theatre with no security benefit against the actual attack class.
Conservative heuristics catch the 80% of attacks worth catching at zero false-positive cost. ML adds opacity that is hard to justify in a security review, adds a model dependency, and produces false positives that degrade the operator experience more than the attacks themselves.
Every write already carries an Ed25519 signature stored alongside the finding. Signing the log of writes would be double-counting and would add an HSM dependency. The per-write signatures are the audit trail.

References

  • MINJA — arXiv:2503.03704 — memory injection against multi-agent systems via planted high-confidence payloads
  • MemoryGraft — arXiv:2512.16962 — variants that graft plausible-looking artefacts onto shared agent memory
  • Dark Side of LLMs — arXiv:2507.06850

Deployment

Docker Compose setup, database migrations, and production configuration

Legal Notice

Authorization requirements, AGPL-3.0 license, and responsible disclosure

Build docs developers (and LLMs) love