Data Models — Agentic-AFL Pipeline Type Contracts

Agentic-AFL uses a strict dataclass pipeline. Each stage of the pipeline consumes the output of the previous stage, and no component reaches across stage boundaries. These types are the public data contracts between components — when adding a new field to any dataclass, update both the producer that populates it and the consumer that reads it.

Data Flow

Binary + StallAddr
     ↓
PCodeSlice         (pcode_slicer.py produces)
     ↓
ConstraintProfile  (constraint_profiler.py produces)
     ↓
VulnerabilitySpec  (spec_exporter.py produces — persisted to PostgreSQL)
     ↓
StallReport        (stall_detector.py produces)
     ↓
Z3GenerationRequest (llm_client.py consumes)
     ↓
Z3Script           (llm_client.py produces — K scripts for voting)
     ↓
Z3Result           (z3_sandbox.py produces)
     ↓
SolvedPayload      (extracted from Z3Result.model)
     ↓
sync_dir/          (payload_injector.py writes)

PCodeInstruction

A single Ghidra P-Code operation extracted from a basic block. P-Code is Ghidra’s architecture-neutral intermediate representation that preserves all semantic information from the original machine code. PCodeInstruction is frozen (immutable).

address

str

The original machine code address as a hex string (e.g., "0x08001234").

mnemonic

str

The P-Code operation mnemonic (e.g., "INT_ADD", "CBRANCH", "LOAD").

inputs

list[str]

List of input varnodes as strings (e.g., ["r0", "0x10", "(ram, 0x20001000, 4)"]).

output

str | None

The output varnode, or None for operations like BRANCH and STORE that produce no output varnode.

raw_pcode

str

The full Ghidra P-Code text line, preserved verbatim for debugging and prompt injection.

call_target

str | None

Resolved function name for CALL operations (e.g., "crc16_modbus"). None for non-call operations or unresolved indirect calls.

PCodeSlice

A taint-bounded backward slice of P-Code instructions from a stall site. The slice contains only P-Code instructions that are data-dependent on the fuzzer’s input buffer (taint source). Instructions resolving to global state not connected to the input are pruned. Produced by: extractor/pcode_slicer.py
Consumed by: extractor/constraint_profiler.py, orchestrator/llm_client.py

binary_path

Path

Path to the analyzed binary.

stall_address

str

The address where AFL++ coverage stalled (hex string, e.g., "0x00401a20").

function_name

str

Ghidra’s decompiled function name. May be "FUN_xxxxx" for stripped binaries.

function_entry

str

Entry point address of the function containing the stall site (hex string).

instructions

list[PCodeInstruction]

Ordered list of P-Code instructions in the backward slice.

taint_source

str

Description of the taint origin, e.g., "RDI" (x86_64 first argument register) or "input_buffer @ 0x20001000".

slice_depth

int

Number of basic blocks traversed backward to build this slice.

truncated

bool

True if the slice was truncated (via assuming(0)) to fit within the LLM token budget.

architecture

Architecture

Target CPU architecture enum value. Determines BitVec widths in generated Z3 scripts.

decompiled_c

str

Ghidra’s decompiled C pseudocode for the containing function. Best-effort; may be empty for obfuscated or heavily optimized binaries.

Show Computed properties

instruction_count

int

len(self.instructions) — number of P-Code instructions in the slice.

pcode_text

str

Concatenated raw_pcode lines joined by newlines. Used directly for LLM prompt injection.

unique_mnemonics

set[str]

Set of distinct P-Code operation mnemonics in the slice. Used by ConstraintProfiler to detect bitwise loops, arithmetic patterns, etc.

ConstraintProfile

A structural fingerprint of a stall site’s mathematical constraint type. Produced from P-Code analysis; consumed by CARM retrieval for Jaccard similarity matching. This is frozen (immutable) and hashable. Produced by: extractor/constraint_profiler.py
Consumed by: orchestrator/retrieval_carm.py

VulnerabilitySpec

A self-contained, JSON-serializable specification for a single stall site. This is the primary data artifact stored in PostgreSQL. It bundles the P-Code slice, constraint profile, and all metadata needed for the Orchestrator to generate a Z3 script without re-running the Extractor. Produced by: extractor/spec_exporter.py
Consumed by: database/spec_store.py, orchestrator/retrieval_carm.py

spec_id

str

Unique 16-character identifier derived as the first 16 hex characters of SHA-256(binary_path + stall_address). Deterministic — the same binary and address always produce the same ID.

binary_path

Path

Absolute path to the analyzed binary.

stall_address

str

The stall site address (hex string).

function_name

str

Containing function name from Ghidra.

pcode_slice

PCodeSlice

The full extracted P-Code backward slice.

constraint_profile

ConstraintProfile

The structural constraint fingerprint computed from the slice.

architecture

Architecture

Target CPU architecture.

z3_template_hint

str | None

A previously successful Z3 script for a structurally similar constraint profile, retrieved from CARM. Injected into the LLM prompt as a starting point.

correction_history

list[CorrectionEntry]

Ordered list of past error→correction pairs accumulated across all ReAct turns for this stall. Fed to the LLM as negative examples to prevent repeated mistakes.

created_at

datetime

UTC timestamp when this spec was first created by SpecExporter.

last_attempted

datetime | None

UTC timestamp of the most recent solve attempt, or None if never attempted.

solve_count

int

Number of times a payload has been successfully injected for this spec.

Show Methods

to_dict() -> dict — Serialize to a JSON-compatible dictionary for PostgreSQL persistence. Includes all fields, with Path objects converted to strings and datetime objects to ISO 8601 strings.generate_id(binary_path: Path, stall_address: str) -> str (static) — Compute the deterministic spec_id from a binary path and stall address.

CorrectionEntry

A single error→correction pair from a past Z3 generation attempt. Stored in VulnerabilitySpec.correction_history to give the LLM a record of prior failures (negative examples) when generating repair prompts. CorrectionEntry is frozen.

error_message

str

The error string returned by the Z3 sandbox or the AgentLoop’s incomplete-model checker.

corrected_script

str

The full Z3Py script that was submitted when this error occurred (not necessarily a corrected version — it is the script that produced the error).

timestamp

datetime

UTC timestamp when this correction entry was created.

StallReport

A report from the stall detector indicating a coverage plateau at a specific address. Placed on the AgentLoop’s priority queue for processing. Produced by: fuzzer_bridge/stall_detector.py
Consumed by: orchestrator/agent_loop.py

stall_address

str

The address where coverage stalled (hex string).

binary_path

Path

Path to the binary being fuzzed.

severity

StallSeverity

Priority classification: CRITICAL, HIGH, MEDIUM, or LOW. Determines queue ordering — CRITICAL stalls are dequeued first.

cycles_stalled

int

Number of AFL++ cycles that have passed with no new edges at this address.

seed_input

bytes

Raw bytes of the AFL++ queue entry that most recently reached this address. Used as the concrete input context for the LLM prompt.

seed_input_path

Path

Filesystem path to the seed file in AFL++‘s queue/ directory.

coverage_bitmap

bytes | None

Snapshot of AFL++‘s coverage bitmap for diffing. None if not captured.

detected_at

datetime

UTC timestamp when the stall was first detected.

Z3GenerationRequest

A request to the LLM to generate Z3 scripts for a specific stall. Bundles all context the LLM needs: the P-Code slice (via VulnerabilitySpec), seed input, retrieved templates, correction history, and GDB runtime state. Produced by: orchestrator/agent_loop.py
Consumed by: orchestrator/llm_client.py

vuln_spec

VulnerabilitySpec

The full vulnerability specification for the stall site, including P-Code slice and constraint profile.

seed_input

bytes

The closest seed input from the AFL++ queue. Provides concrete byte context and determines the input length used for byte-variable counting.

retrieved_templates

list[str]

Previously successful Z3 scripts retrieved from CARM for structurally similar stalls. Injected into the LLM prompt as positive examples.

correction_history

list[CorrectionEntry]

Error→correction pairs from prior ReAct turns. Grows by one entry per failed turn.

k_vote_count

int

Number of parallel Z3 script candidates to generate (K-way voting). Defaults to settings.k_vote_count.

base_offset

int

File byte offset where the function’s input pointer begins. Discovered by the REDQUEEN-style offset probe. When > 0, the LLM is told that input[0] maps to byte_{base_offset} in the full file.

runtime_state

dict[str, str]

GDB-captured memory/register values at function entry, keyed by names such as "rdi_ptr", "rdi_hex", "rsi_value". Used by the LLM to determine concrete struct field values invisible to static analysis.

Z3Script

A Z3Py script generated by the LLM. One Z3Script is produced per voting candidate per ReAct turn. Produced by: orchestrator/llm_client.py
Consumed by: orchestrator/z3_sandbox.py

script_text

str

The full Z3Py Python code, sanitized and ready for sandbox execution. The sandbox strips duplicate from z3 import *, s = Solver(), and s.check() calls before wrapping.

generation_idx

int

Which of the K voting candidates this script is (zero-indexed, 0 to k_vote_count - 1).

attempt_number

int

Which ReAct turn produced this script (one-indexed). 1 for the initial generation; higher for repairs.

prompt_tokens

int

Token count of the prompt submitted to the LLM. Used for API cost tracking.

completion_tokens

int

Token count of the LLM’s response. Used for API cost tracking.

model_name

str

The LLM model identifier that produced this script (e.g., "gpt-4.1", "gemini-2.0-flash").

Z3Result

The result of executing a Z3Script in the sandbox. Carries the solver verdict, the concrete variable model (if satisfiable), error details (if not), and timing. Produced by: orchestrator/z3_sandbox.py
Consumed by: orchestrator/agent_loop.py

verdict

Z3Verdict

One of SAT, UNSAT, TIMEOUT, SYNTAX_ERROR, RUNTIME_ERROR, or UNKNOWN. See constants.Z3Verdict for semantics.

model

dict[str, int] | None

When verdict == Z3Verdict.SAT, a dict mapping Z3 variable names to concrete integer values. Variable names follow the byte_N convention so AgentLoop._model_to_payload() can reconstruct the input buffer. None for all non-SAT verdicts.

error_message

str | None

Error string when verdict is not SAT. This string is fed back to the LLM in repair prompts. None when verdict is SAT.

execution_time

float

Wall-clock seconds the sandbox subprocess ran (including subprocess startup overhead).

script

Z3Script

The Z3Script that produced this result. Used for pairing errors with the scripts that caused them during the repair selection step.

SolvedPayload

A concrete byte-array payload extracted from a SAT model and ready for injection into AFL++‘s sync directory. Produced by: orchestrator/agent_loop.py (via _model_to_payload())
Consumed by: fuzzer_bridge/payload_injector.py

raw_bytes

bytes

The payload bytes to write to the sync directory. Constructed by overlaying Z3-solved byte_N values onto the original seed input, preserving seed bytes at positions not covered by the model.

source_spec_id

str

The VulnerabilitySpec.spec_id that this payload solves. Used by the CARM retriever to update the winning template.

stall_address

str

The stall address this payload is designed to bypass (hex string).

z3_model

dict[str, int]

The raw Z3 model dict, preserved for audit logging and harvest-mode verification.

confidence

float

Score from 0.0 to 1.0 representing solve confidence. Currently 1.0 for all single-SAT accepts; future K-way agreement scoring will modulate this.

Show Properties

filename -> str — Generates a descriptive filename for the sync directory in the format agentic_<spec_id_prefix>_<stall_addr>_<YYYYMMDD_HHMMSS>.bin. AFL++ ingests any file written to the sync directory on its next cycle.

CLI Reference

Python API

Data & Schema

Data Models — Agentic-AFL Pipeline Type Contracts

Data Flow

PCodeInstruction

PCodeSlice

ConstraintProfile

VulnerabilitySpec

CorrectionEntry

StallReport

Z3GenerationRequest

Z3Script

Z3Result

SolvedPayload

Build docs developers (and LLMs) love

CLI Reference

Python API

Data & Schema

Documentation Index

​Data Flow

​PCodeInstruction

​PCodeSlice

​ConstraintProfile

​VulnerabilitySpec

​CorrectionEntry

​StallReport

​Z3GenerationRequest

​Z3Script

​Z3Result

​SolvedPayload

Build docs developers (and LLMs) love

Data Flow

PCodeInstruction

PCodeSlice

ConstraintProfile

VulnerabilitySpec

CorrectionEntry

StallReport

Z3GenerationRequest

Z3Script

Z3Result

SolvedPayload