Autonomous CTF Solving and Flag Capture with Pentest Swarm

CTF mode deploys the swarm against a single machine IP with one goal: capture all flags. The solver iterates through enumeration, initial foothold, privilege escalation, and flag collection — generating a structured writeup at the end. It is designed for retired HackTheBox machines, TryHackMe rooms, and personal lab targets where you have explicit access.

Supported platforms

Platform	Short code	Notes
HackTheBox	`htb`	Uses the HTB API (`labs.hackthebox.com/api/v4`). Retired machines only by default.
TryHackMe	`thm`	Uses the THM API. Requires your TryHackMe API key.
Generic / lab	`generic`	No platform API calls — just solve against the raw IP.

Solve a CTF machine

Connect to the platform VPN

The solver does not manage VPN connections. You must connect to the HackTheBox or TryHackMe VPN before running the solver — the machine’s IP must be reachable from your host.

# HackTheBox — download your .ovpn from the HTB dashboard
sudo openvpn --config ~/htb.ovpn &

# TryHackMe — download from tryhackme.com/access
sudo openvpn --config ~/thm.ovpn &

Wait for the TUN interface to come up (tun0 or similar) before continuing.

Run the CTF solver

pentestswarm ctf solve 10.10.10.1 --platform htb --machine Lame --follow

The --follow flag streams live agent events — recon findings, exploitation attempts, privilege escalation chains — as they happen.

  [recon]   nmap: open ports 21/ftp, 22/ssh, 139/smb, 445/smb
  [recon]   httpx: no web service on common ports
  [classify] CVE-2007-2447 matched: Samba 3.0.20 usermap script RCE
  [exploit]  attack chain: SMB → usermap → reverse shell
  [exploit]  foothold established as daemon
  [privesc]  checking SUID binaries, cron, writable PATH...
  [flags]    user.txt captured: /home/makis/user.txt
  [flags]    root.txt captured: /root/root.txt
  [report]   writeup written to ./reports/Lame-htb-writeup.md

The solver sets the campaign objective to:

CTF: Find all flags (user.txt and root.txt). Focus on privilege escalation
chains, SUID binaries, cron jobs, writable scripts, kernel exploits, and
password reuse. Check /home/*/user.txt and /root/root.txt.

Review captured flags and writeup

When the solver completes, flags and the full writeup are written to ./reports/:

# View the writeup
cat ./reports/Lame-htb-writeup.md

# Retrieve the writeup for a previously-completed campaign
pentestswarm ctf writeup <campaign-id>

The generated writeup follows a standard structure:

# Lame — HTB Writeup

**Time:** 4m32s
**Result:** Solved

---

## Enumeration
- [recon] nmap: open ports 21/ftp 22/ssh 139/smb 445/smb
...

## Exploitation
- [exploit] attack chain: SMB → usermap → reverse shell
...

## Flags
- **user flag**: `...`
  - Path: /home/makis/user.txt
  - Method: ...

- **root flag**: `...`
  - Path: /root/root.txt
  - Method: ...

CTF solver playbook

The built-in ctf-solver.yaml playbook defines the four-phase solve loop. The difficulty variable controls the agent-hour budget allocated to the campaign:

name: CTF Solver Swarm
description: >
  Autonomous CTF machine solver for retired HackTheBox / TryHackMe boxes
  (or your own lab targets). The swarm iterates: enumerate -> exploit ->
  stabilize foothold -> escalate -> collect flag. Retired boxes only by
  default — live competition boxes require explicit opt-in.
author:
  name: Armur AI
  github: Armur-Ai
version: 1.0.0
tags: [ctf, htb, thm, benchmark, autonomous]

variables:
  target_ip:
    type: string
    required: true
  difficulty:
    type: string
    default: easy
    description: "easy | medium | hard — affects agent-hour budget"
  flag_path_hints:
    type: string
    default: "/root/root.txt,/home/*/user.txt"
    description: Comma-separated list of likely flag paths

phases:
  - name: enumeration
    tools:
      - name: nmap
        options: { scan_type: "-sV", top_ports: 1000, timing: "-T4" }
      - name: httpx
      - name: katana
        options: { depth: 3 }
      - name: nuclei
        options: { severity: [critical, high, medium] }
    post_analysis: |
      Pull the enum into one coherent picture. CTF boxes have a
      single intended path 95% of the time — find the weirdest thing
      and follow it.

  - name: initial_foothold
    tools:
      - name: sqlmap
        options: { risk: 2, level: 3 }
    post_analysis: |
      Try the obvious first (default creds, exposed config files,
      git leaks, known CVEs for the exact banner version).
      Stabilize any shell into a proper TTY before privesc.

  - name: privilege_escalation
    tools:
      - name: nuclei
        options: { templates: ["network/enumeration/"] }
    post_analysis: |
      SUID binaries, sudo -l, cron jobs, writable PATH entries,
      capabilities, kernel exploits (last resort — crashes boxes).

  - name: flag_collection
    tools: []
    post_analysis: |
      Read flags from flag_path_hints. Submit via box's grader
      if credentials available; otherwise surface to report.

Run the playbook directly against an IP:

pentestswarm playbook run ctf-solver --target 10.10.10.1

CTF-specific behaviours

Flag detection — The solver monitors all agent events for strings containing flag, user.txt, or root.txt. Matching events are parsed into typed Flag objects (user or root) with the capture path and method recorded. Writeup generation — GenerateWriteup builds a structured markdown document from the event stream, separating enumeration events (recon, subfinder, naabu, httpx) from exploitation events (exploit, attack, chain). The writeup is machine-readable and suitable for publishing on platforms like Hack The Box writeup sites. Budget by difficulty — The difficulty variable maps to agent-hour budgets: easy is tighter, hard allocates significantly more time for complex multi-step chains. Shell stabilisation — The solver’s initial_foothold post-analysis prompt explicitly requires stabilizing any shell into a proper TTY before moving to privilege escalation. This prevents common failure modes where privesc tools fail without a real terminal.

List available machines

pentestswarm ctf list --platform htb --difficulty easy

This calls the HTB API (/api/v4/machine/list) and returns machines filtered by difficulty:

  NAME        OS       DIFFICULTY   ID
  Lame        Linux    Easy         1
  Jerry       Windows  Easy         144
  Blue        Windows  Easy         51
  Legacy      Windows  Easy         2
  Devel       Windows  Easy         3

Retrieve a writeup

pentestswarm ctf writeup <campaign-id>

Retrieves and prints the writeup for a completed CTF campaign by its UUID. Campaign IDs are printed at the end of every ctf solve run.

CTF mode may only be used against machines you own or have explicit authorization to attack. For HackTheBox and TryHackMe, this means machines you have started (spawned/joined), operating within the platform’s terms of service. Never run the solver against live competition machines unless the competition explicitly permits automated tools. Unauthorized access to computer systems is illegal.

Playbooks

Customize the CTF solver playbook or author your own for specific box types.

MCP Integration

Drive CTF solves interactively from Claude Desktop with live event streaming.

Bug Bounty

Apply the same recon and exploitation skills to real bug bounty programs.

GitHub Actions

Run the CI/CD security playbook to find vulns in your own repos.

Get Started

Core Concepts

CLI Reference

Guides

Security & Operations

Autonomous CTF Solving and Flag Capture with Pentest Swarm

Supported platforms

Solve a CTF machine

CTF solver playbook

CTF-specific behaviours

List available machines

Retrieve a writeup

Playbooks

MCP Integration

Bug Bounty

GitHub Actions

Build docs developers (and LLMs) love

Get Started

Core Concepts

CLI Reference

Guides

Security & Operations

Documentation Index

​Supported platforms

​Solve a CTF machine

​CTF solver playbook

​CTF-specific behaviours

​List available machines

​Retrieve a writeup

Playbooks

MCP Integration

Bug Bounty

GitHub Actions

Build docs developers (and LLMs) love

Supported platforms

Solve a CTF machine

CTF solver playbook

CTF-specific behaviours

List available machines

Retrieve a writeup