Agentic-AFL: Neuro-Symbolic Fuzzing for ICS/OT Binaries

Agentic-AFL is a neuro-symbolic fuzzing orchestration framework that runs alongside AFL++ as an asynchronous co-process. When AFL++ stalls on a complex arithmetic or checksum constraint — the kind of math wall that standard mutation strategies cannot cross — Agentic-AFL intercepts the stall, extracts the relevant binary logic via Ghidra P-Code analysis, translates that logic into a Z3 SMT constraint using an LLM, solves it mathematically, and injects the concrete satisfying payload back into AFL++‘s sync directory without ever pausing the fuzzer.

The Problem

Coverage-guided fuzzers like AFL++ excel at exploring branching logic through random mutation, but they break down whenever a branch depends on a value that must satisfy a precise arithmetic relationship. In Industrial Control System (ICS) and Operational Technology (OT) protocol implementations, this happens constantly. Consider a Modbus RTU parser that validates an incoming frame with CRC-16: unless all 16 bits of the checksum are exactly right, the parser immediately rejects the frame and AFL++ never reaches the protocol logic behind it. The same problem appears in DNP3 frames with CRC fields in every data block, in IEC 60870-5 frames with link-layer checksums, and in proprietary SCADA protocols that use custom polynomial or bitmask validation. AFL++‘s built-in mutation strategies — bitflip, havoc, splice, and even its REDQUEEN-style equality matching — cannot reliably satisfy these constraints because they involve non-trivial multivariate arithmetic over all the bytes contributing to the checksum. The result is a coverage plateau: AFL++ keeps running, keeps mutating, but the edge count stops growing because every generated input fails the same math check at the same address.

The Solution

Agentic-AFL attacks these math walls with a neuro-symbolic pipeline: it uses symbolic program analysis to extract what the constraint is, uses an LLM to translate that constraint into formal Z3 SMT solver syntax, and uses Z3 itself to solve for a concrete byte sequence that satisfies it. The key insight is the division of labor: the LLM is a translator, not a solver. Large language models are good at reading structured intermediate representation and writing syntactically correct Z3Py code, but they are not reliable arithmetic solvers. Z3, on the other hand, is a proven decision procedure — it can find a satisfying assignment to a system of bit-vector constraints in milliseconds or prove that none exists. By combining Ghidra’s P-Code intermediate representation (architecture-neutral, semantics-preserving) with LLM-driven Z3Py synthesis and SMT solving, Agentic-AFL achieves reliable constraint bypass across a wide range of real-world ICS protocol implementations.

Key Features

K-Way Voting

Generates K independent Z3Py scripts in parallel per stall (default K=3) and accepts the first SAT result. Mitigates the LLM’s syntax-error rate without incurring sequential latency.

ReAct Self-Repair Loop

When all K scripts fail, the best error message is automatically fed back to the LLM for iterative repair. Up to max_react_turns (default 5) attempts are made before deferring back to AFL++.

CARM Retrieval

Constraint-Aware Retrieval Memory uses server-side Jaccard similarity on PostgreSQL to find previously successful Z3 templates with matching structural constraint profiles. Reuses knowledge across binaries and sessions.

Taint-Bounded P-Code Slicing

PCodeSlicer invokes Ghidra headless analysis and performs backward slicing bounded by taint propagation from the input buffer. Only instructions data-dependent on fuzzer-controlled bytes are included, keeping prompts concise.

Async Payload Injection

PayloadInjector writes solved payloads atomically into AFL++‘s sync directory. AFL++ discovers and ingests them on its next cycle with zero interruption to the fuzzer process.

Harvest Mode

A self-bootstrapping corpus mode that disables CARM retrieval to force zero-shot Z3 generation, verifies each solve via edge-count delta after injection, and auto-commits verified (P-Code, tags, Z3 script) triplets back to the CARM database.

Multi-Architecture Support

Supports ARM32, ARM64, X86, X86_64, MIPS32, and PPC32. Architecture determines BitVec widths in generated Z3 scripts and is tracked per constraint profile for architecture-filtered CARM queries.

Multi-Provider LLM

Supports OpenAI GPT-4.1 and Google Gemini via a unified LLMClient interface. The provider and model are configured through environment variables with no code changes required.

When to Use Agentic-AFL

Agentic-AFL is designed for targets where AFL++ reliably stalls on the same address despite extended fuzzing campaigns. The most common scenarios in ICS/OT security research are:

Protocol parsers with frame-level integrity checks — Modbus RTU (CRC-16), DNP3 (CRC-16 per block), IEC 60870-5 (FCS), S7comm, EtherNet/IP, and proprietary SCADA protocols. These parsers validate a checksum before doing anything else, so AFL++ never reaches the actual protocol logic without a valid checksum.
Firmware with multi-layer validation — Embedded firmware that checks a magic number, a length field, and a CRC in sequence. AFL++ may pass the first check by luck but then stall on the second or third.
Custom or proprietary checksum algorithms — Not just standard CRC-16/32 but any bitmask-loop or BITWISE_LOOP pattern. CARM’s structural tags are algorithm-agnostic: a proprietary polynomial checksum and a standard CRC-16 produce the same structural fingerprint and can share templates.
Targets where REDQUEEN/CmpLog has already been tried — AFL++‘s built-in equality-matching extensions work well for simple constant comparisons but cannot handle multi-byte checksums computed across the entire input. Agentic-AFL is the next step after CmpLog reaches its limits.

Prerequisites

Before installing Agentic-AFL, ensure the following are available on your system:

AFL++ 4.x — afl-fuzz and afl-cc must be on your PATH. AFL++‘s sync directory protocol is what Agentic-AFL uses for payload injection.
Ghidra 11.x — Required for headless P-Code extraction and decompilation. Set the GHIDRA_INSTALL_DIR environment variable to your Ghidra installation path (default: /opt/ghidra).
Python 3.11+ — The orchestration framework, LLM client, and Z3 sandbox all require Python 3.11 or newer.
PostgreSQL — Stores constraint profiles and Z3 templates for CARM retrieval. Any PostgreSQL 14+ instance works; a Docker one-liner is provided in the quickstart.
LLM API Key — An OpenAI API key (for GPT-4.1) or a Google Gemini API key. Set LLM_API_KEY in your .env file.

Get Started

Configuration

Guides

Architecture

Agentic-AFL: Neuro-Symbolic Fuzzing for ICS/OT Binaries

The Problem

The Solution

Key Features

K-Way Voting

ReAct Self-Repair Loop

CARM Retrieval

Taint-Bounded P-Code Slicing

Async Payload Injection

Harvest Mode

Multi-Architecture Support

Multi-Provider LLM

When to Use Agentic-AFL

Prerequisites

Build docs developers (and LLMs) love

Get Started

Configuration

Guides

Architecture

Documentation Index

​The Problem

​The Solution

​Key Features

K-Way Voting

ReAct Self-Repair Loop

CARM Retrieval

Taint-Bounded P-Code Slicing

Async Payload Injection

Harvest Mode

Multi-Architecture Support

Multi-Provider LLM

​When to Use Agentic-AFL

​Prerequisites

Build docs developers (and LLMs) love

The Problem

The Solution

Key Features

When to Use Agentic-AFL

Prerequisites