Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/AdithyaaSivamal/Agentic-AFL/llms.txt

Use this file to discover all available pages before exploring further.

The extractor pipeline is the first phase of the Agentic-AFL processing loop. It takes two inputs — a binary path and a stall address — and produces a VulnerabilitySpec persisted to PostgreSQL. The pipeline is entirely deterministic: no LLM calls are made during extraction or profiling. The three components run sequentially: PCodeSlicer produces a PCodeSlice, ConstraintProfiler analyzes it into a ConstraintProfile, and SpecExporter packages both into a VulnerabilitySpec and upserts it to the database.

PCodeSlicer

PCodeSlicer extracts a taint-bounded backward slice of Ghidra P-Code at the stall address. P-Code is Ghidra’s architecture-neutral intermediate representation — it preserves full semantic content of machine instructions while abstracting over CPU-specific details such as register names and instruction encodings. Taint-bounded backward slicing means the slice contains only P-Code operations that are data-dependent on the fuzzer’s input buffer (the taint source). Operations that resolve to global state unconnected to the input are pruned. This prevents memory-state explosion — without taint bounding, a backward slice would pull in the entire firmware binary.
1

Invoke Ghidra headless

PCodeSlicer constructs a pyghidraRun command that imports the binary, runs the extract_pcode.py Jython script as a post-analysis script, and passes the stall address, taint source, and max_slice_depth as script arguments. The process is given a 180-second timeout to accommodate large firmware images.
2

Parse delimited JSON

The Jython script emits a JSON block to stdout between the delimiters ===PCODE_JSON_START=== and ===PCODE_JSON_END===. PCodeSlicer searches the combined stdout and stderr for these markers, extracts the JSON, and parses it into a list of PCodeInstruction dataclasses.
3

Apply taint bounding (defense-in-depth)

Although the Ghidra Jython script performs taint-bounded slicing internally, PCodeSlicer applies a second pass in Python. Starting from the stall instruction’s inputs, it walks backward through the instruction list and prunes any instruction whose output varnode is not in the taint set. This is a defense-in-depth filter against overly inclusive Ghidra output.
4

Truncate if needed

If the bounded slice still exceeds max_pcode_instructions (default 200), the oldest instructions are dropped from the front of the list — keeping the instructions closest to the stall address, which carry the most relevant constraints. The PCodeSlice.truncated flag is set to True. This keeps the P-Code within approximately 4K LLM tokens.
5

Return PCodeSlice

A PCodeSlice dataclass is returned, carrying the ordered instruction list, function name, function entry address, taint source, slice depth, architecture, truncation flag, and Ghidra’s decompiled C pseudocode.
from pathlib import Path
from agentic_afl.extractor.pcode_slicer import PCodeSlicer
from agentic_afl.constants import Architecture

slicer = PCodeSlicer(
    architecture=Architecture.ARM32,
    max_slice_depth=20,
    max_instructions=200,
)
pcode_slice = slicer.extract_slice(
    binary_path=Path("./firmware.bin"),
    stall_address="0x08001234",
    taint_source="RDI",
)

ConstraintProfiler

ConstraintProfiler is a deterministic, heuristic-based engine that analyzes a PCodeSlice and produces a ConstraintProfile. It makes no LLM calls and produces the same output for the same input every time. This determinism is essential: the ConstraintProfile must be stable across runs so that Jaccard similarity scores in CARM are consistent. The profiler runs a sequence of independent detector functions over the instruction list. Each detector looks for a specific P-Code mnemonic pattern and returns a ConstraintTag enum value if the pattern is found, or None if it is not. The full set of detected tags is collected into a frozenset[ConstraintTag]. Tags are algorithm-agnostic. A proprietary checksum routine and a standard CRC-16 both produce BITWISE_LOOP (both have a loop dominated by XOR and shift) and INDEXED_LOOKUP (both may use a lookup table indexed by input bytes). This means CARM can retrieve a CRC template to help solve an unknown proprietary checksum, because their structural fingerprints match. The mnemonic classification sets used by the detectors are:
CategoryP-Code Mnemonics
BITWISE_OPSINT_AND, INT_OR, INT_XOR, INT_NEGATE, INT_LEFT, INT_RIGHT, INT_SRIGHT
ARITHMETIC_OPSINT_ADD, INT_SUB, INT_MULT, INT_DIV, INT_SDIV, INT_REM, INT_SREM
COMPARISON_OPSINT_EQUAL, INT_NOTEQUAL, INT_LESS, INT_SLESS, INT_LESSEQUAL, INT_SLESSEQUAL
BRANCH_OPSBRANCH, CBRANCH, BRANCHIND, CALL, CALLIND, RETURN
MEMORY_OPSLOAD, STORE
In addition to P-Code analysis, ConstraintProfiler performs a supplementary scan of the decompiled C pseudocode attached to the PCodeSlice. This catches structural patterns that taint bounding or the decompiler may have obscured — for example, a for loop with bitwise operators clearly visible in C but whose loop-header PHI node was pruned from the slice. Beyond tags, the profiler computes four numerical metrics:
  • bitwise_density — ratio of bitwise ops to total ops (0.0–1.0)
  • arithmetic_density — ratio of arithmetic ops to total ops (0.0–1.0)
  • loop_depth — count of MULTIEQUAL (PHI) nodes, which indicate loop headers in SSA form
  • register_count — number of distinct register varnodes in the slice (correlates with the number of BitVec variables the LLM must declare)
  • estimated_complexity — heuristic difficulty score (0–100); BITWISE_LOOP adds 30 points, INPUT_DEPENDENT_LOOP adds 20, CHAINED_LOAD adds 15, each tag adds 10, each loop depth level adds 5, and each register adds 2
from agentic_afl.extractor.constraint_profiler import ConstraintProfiler

profiler = ConstraintProfiler()
profile = profiler.analyze(pcode_slice)
print(profile.tags)             # frozenset of ConstraintTag enums
print(profile.estimated_complexity)  # 0-100

SpecExporter

SpecExporter is the final step of the extractor pipeline. It takes a PCodeSlice and ConstraintProfile, constructs a VulnerabilitySpec, and persists it. The spec_id is deterministic: it is the first 16 hex characters of the SHA-256 hash of str(binary_path.resolve()) + ":" + stall_address. This means re-running the extractor on the same stall address updates the existing PostgreSQL record rather than creating a duplicate — an upsert semantics ensured by SpecStore.save_spec(). The persisted VulnerabilitySpec also carries a correction_history list. Each time the Orchestrator attempts to solve a stall and the Z3 sandbox returns a non-SAT verdict, a CorrectionEntry (error message + failed script) is appended. On future retrieval of the same spec, this history is fed to the LLM as negative examples to guide self-repair. SpecExporter supports two persistence backends:

PostgreSQL (production)

The preferred backend. SpecStore.save_spec() upserts the spec as a row with INTEGER[] constraint tags (for GIN-indexed Jaccard queries), a JSONB profile data field, and the raw P-Code text. Required for CARM retrieval to function.

JSON file (fallback)

If no SpecStore is provided, the spec is serialized to a JSON file in the configured json_dir. Useful for development and testing without a PostgreSQL instance. CARM retrieval is unavailable in this mode.

Ghidra Jython Script

The extract_pcode.py Jython script runs inside Ghidra’s JVM during headless analysis. It has access to Ghidra’s full Java API for decompilation, basic block analysis, and P-Code extraction — functionality not available from outside the JVM.
The script is located at agentic-afl/extractor/ghidra_scripts/extract_pcode.py. It must be placed in Ghidra’s script search path or the directory configured by the GHIDRA_SCRIPT_DIR environment variable. PCodeSlicer passes -scriptPath <dir> to pyghidraRun to locate it.
The script outputs a single JSON block to stdout delimited by ===PCODE_JSON_START=== and ===PCODE_JSON_END===. The JSON includes the function name, function entry address, instruction list (each with address, mnemonic, inputs, output, raw P-Code, and resolved call target), slice depth, truncation flag, pruned LOAD list, and optional decompiled C pseudocode for the function and its callees. The script also emits Ghidra’s decompiled C for the target function and up to two levels of callees. When the stall is at a function entry point, it also includes C for the calling function, which shows how input buffer bytes are parsed into the function’s arguments — context that is invisible to the P-Code slice alone.
The extractor caches results via PostgreSQL. If SpecExporter.export() is called for the same (binary_path, stall_address) pair a second time, the existing PostgreSQL row is updated with the latest profile data and the VulnerabilitySpec is returned without re-invoking Ghidra. Re-running Ghidra is only triggered when the PCodeSlicer is called directly.

Build docs developers (and LLMs) love