Orchestrator: AgentLoop, CARM, LLM, and Z3 Sandbox

The orchestrator subsystem is the brain of Agentic-AFL. It receives StallReport objects from the fuzzer bridge, drives the full Extract → CARM → Generate → Solve → Inject pipeline, and manages the ReAct self-correction loop when Z3 scripts fail. All orchestrator components are async-native: they use asyncio.gather for parallel LLM generation, asyncio.wait_for for sandbox timeouts, and asyncio.PriorityQueue for stall ordering — ensuring the entire pipeline fits inside a single event loop without blocking AFL++.

AgentLoop Lifecycle

AgentLoop is the central coordinator. It is instantiated once per fuzzing campaign, initialized via setup(), and then run as a daemon via run().

setup()

Instantiates all sub-components in dependency order:

SpecStore — connects to PostgreSQL and runs schema migrations
PCodeSlicer, ConstraintProfiler, SpecExporter — the extractor pipeline
CARMRetriever, LLMClient, Z3Sandbox — the orchestrator pipeline
StallDetector, PayloadInjector — the fuzzer bridge

Configuration is sourced from settings (which reads from .env / environment variables). The afl_output_dir, afl_sync_dir, target_binary, and min_stall_cycles parameters passed to AgentLoop.__init__() override their corresponding settings values.

run() — main asyncio loop

Runs indefinitely until shutdown() is called (by a SIGINT or SIGTERM signal handler registered during run()). Each iteration:

Poll — StallDetector.detect() is called; any new stalls not already in _active_stalls or _solved_stalls are enqueued in _stall_queue at their severity priority
Dequeue — the highest-priority item is dequeued
Process — _process_stall() is called (see below)
Sleep — asyncio.sleep(settings.stall_poll_interval) before the next poll

_process_stall() — the full pipeline

This is the core logic method. It runs the complete pipeline for a single stall:

Extract a PCodeSlice via PCodeSlicer
Produce a ConstraintProfile via ConstraintProfiler
Upsert a VulnerabilitySpec to PostgreSQL via SpecExporter
Query CARM for historical templates via CARMRetriever
Run a REDQUEEN-style GDB offset probe to determine where in the file the function’s input pointer starts
Build a Z3GenerationRequest and pass it to _generate_and_solve()
If a SolvedPayload is returned, inject it via PayloadInjector and trigger DiversityGenerator
Update the winning Z3 script as a CARM template via CARMRetriever.update_template()

Graceful shutdown

shutdown() sets _running = False, which causes the run() loop to exit after the current iteration. The SpecStore connection pool is then closed cleanly. Final runtime metrics are logged.

Solved stalls are permanently suppressed: once _process_stall() returns True, the stall address is added to _solved_stalls and StallDetector.resolve_stall() is called, preventing re-detection on future polls.

CARMRetriever

The Constraint-Aware Retrieval Module (CARM) finds historical VulnerabilitySpec records whose ConstraintProfile tag sets are most similar to the current stall site. Retrieval is two-stage: Stage 1 — PostgreSQL server-side Jaccard filter CARMRetriever.query() converts the current ConstraintProfile’s frozenset[ConstraintTag] to an INTEGER[] array via SpecStore.tags_to_int_array(), then calls SpecStore.query_by_jaccard(). PostgreSQL computes jaccard_similarity(stored_tags, query_tags) server-side using a GIN-indexed INTEGER[] column and returns only rows with score ≥ similarity_threshold (default configurable), up to max_results rows. This is an O(log N) indexed operation — not a full-table scan. Stage 2 — Python re-ranking The small result set returned by PostgreSQL (at most max_results rows) is re-ranked in Python using a weighted relevance score:

relevance = 0.6 × jaccard_score
          + 0.2 × complexity_similarity
          + 0.2 × solve_score

complexity_similarity penalizes specs whose estimated_complexity differs significantly from the query’s. solve_score rewards specs that have been successfully solved before (capped at 5 solves to prevent runaway bias). The final list is sorted by relevance_score descending.

from agentic_afl.orchestrator.retrieval_carm import CARMRetriever
from agentic_afl.database.spec_store import SpecStore
from agentic_afl.constants import Architecture

store = SpecStore(dsn="postgresql://...")
await store.initialize()
retriever = CARMRetriever(store=store)
results = await retriever.query(
    query_profile=profile,
    architecture=Architecture.ARM32,
)
for r in results:
    print(r.jaccard_score, r.z3_template[:80])

The optional architecture argument filters results to matching CPU architectures. This prevents ARM32 templates from being suggested for MIPS32 targets, where different register widths would produce incompatible BitVec declarations.

LLMClient and K-way Voting

LLMClient is the translation engine. Its only job is to convert a Z3GenerationRequest (containing a VulnerabilitySpec, seed input, retrieved templates, and correction history) into K candidate Z3Script objects. The LLM is used strictly as a translator — it never reasons about satisfiability directly. Provider selection is controlled by settings.llm_api_provider:

OpenAIProvider

Uses OpenAI’s chat.completions.create API. Supports any model accessible via the OpenAI API. Configured via LLM_API_KEY and LLM_MODEL_NAME settings.

GeminiProvider

Uses Google’s genai SDK with aio.models.generate_content. Generates K independent completions via asyncio.gather to maximize candidate variance. Configured via GEMINI_API_KEY and LLM_MODEL_NAME settings.

LocalProvider

Stub for future vLLM/Ollama integration. Not implemented in the current release — raises NotImplementedError. Intended for offline campaigns without API costs.

K-way voting generates k_vote_count completions in a single provider call (or via asyncio.gather for Gemini, which issues K independent API requests). All K Z3Script objects are then executed in parallel by the sandbox. The first script to return SAT is accepted. If multiple scripts return SAT, the one with the most byte-indexed variables in its model is preferred, ensuring the most complete constraint coverage. Code extraction uses a three-strategy fallback:

Standard ```python ... ``` regex
Relaxed regex (no newline requirement after language tag)
Fence stripping with heuristic detection of the first code-like line

Z3Sandbox

Z3Sandbox executes LLM-generated Z3Py scripts in a separate Python subprocess. This isolation is non-negotiable: LLM-generated code is untrusted and must not be executed in the main process. Subprocess isolation is implemented via asyncio.create_subprocess_exec(sys.executable, script_path). Each script runs in a fresh Python interpreter with no shared state from the main process. Resource limits are applied inside the subprocess before user code runs:

RLIMIT_AS — virtual address space capped at 512 MB
RLIMIT_CPU — CPU time limited by the Z3 solver timeout

Import whitelist — after from z3 import * is loaded (Z3’s C extension internally uses os, ctypes, etc. during module initialization), a restricted __import__ function is installed that blocks imports of os, subprocess, shutil, pathlib, socket, http, urllib, requests, httpx, multiprocessing, threading, and related modules. RLIMIT_AS and RLIMIT_CPU serve as the true security backstop. Script sanitization — before wrapping, the sandbox strips any lines from the LLM-generated script that would conflict with the harness: from z3 import *, s = Solver(), s.set("timeout", ...), s.check(), m = s.model(), and model-printing blocks. The harness provides these itself and invokes them after the LLM’s constraint-building code. Verdict mapping:

Verdict	Meaning
`SAT`	Solver found a satisfying model; `result.model` is a `dict[str, int]`
`UNSAT`	Constraints are unsatisfiable — no solution exists
`TIMEOUT`	`s.check()` exceeded the timeout budget
`SYNTAX_ERROR`	Script failed to parse or compile; subprocess returned non-zero
`RUNTIME_ERROR`	Script executed but raised a Python exception
`UNKNOWN`	Z3 returned `unknown` (typically non-linear arithmetic)

Results are printed as JSON between ===Z3_RESULT_JSON=== and ===Z3_RESULT_END=== delimiters on stdout, then parsed by the sandbox into a Z3Result dataclass.

ReAct Loop

The _generate_and_solve() method in AgentLoop implements a Reason-Act loop that drives K-way generation and self-repair across up to max_react_turns iterations (default 5).

For each turn (1 … max_react_turns):
  1. Generate K Z3 scripts in parallel via LLMClient
  2. Execute all K scripts in parallel via Z3Sandbox
  3. If any result is SAT:
       – Pick the SAT result with the most byte-indexed variables
       – Verify it meets the minimum coverage threshold
       – Convert the model to a SolvedPayload and return
  4. If all K results are non-SAT:
       – Select the "best" failure (see preference order below)
       – Append a CorrectionEntry to the request's correction_history
       – Persist the correction to PostgreSQL for future learning
       – Continue to the next turn
  5. If all turns exhausted → return None (defer to AFL++)

Best-failure preference for self-repair feedback (most informative first):

UNSAT — the script compiled and ran, but the constraints are logically inconsistent; the most structurally informative failure
RUNTIME_ERROR — the script executed but hit a Python exception; error message often pinpoints the specific Z3 API misuse
SYNTAX_ERROR — the script failed to parse; error message identifies the syntax problem
TIMEOUT — Z3 couldn’t solve within the budget; suggests the constraint model may need concretization
UNKNOWN — Z3 returned unknown; typically indicates non-linear arithmetic

Partial model acceptance prevents the agent from accepting a trivially incomplete SAT result. A SAT model is only accepted if it constrains at least min(8, max(input_len // 2, 1)) byte-indexed variables. For a zero-length seed input the threshold is 1. If a model falls below this threshold, a CorrectionEntry is appended describing which byte positions are missing, and the loop continues to the next turn.

Set DEBUG_MODE=true in your environment to have the agent write each Z3 script and its result to /tmp/agentic_afl_debug/ during a campaign. Files are named turn{N}_k{K}_{verdict}.py and .txt respectively.

Get Started

Configuration

Guides

Architecture

Orchestrator: AgentLoop, CARM, LLM, and Z3 Sandbox

AgentLoop Lifecycle

CARMRetriever

LLMClient and K-way Voting

OpenAIProvider

GeminiProvider

LocalProvider

Z3Sandbox

ReAct Loop

Build docs developers (and LLMs) love

Get Started

Configuration

Guides

Architecture

Documentation Index

​AgentLoop Lifecycle

​CARMRetriever

​LLMClient and K-way Voting

OpenAIProvider

GeminiProvider

LocalProvider

​Z3Sandbox

​ReAct Loop

Build docs developers (and LLMs) love

AgentLoop Lifecycle

CARMRetriever

LLMClient and K-way Voting

Z3Sandbox

ReAct Loop