The orchestrator subsystem is the brain of Agentic-AFL. It receivesDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/AdithyaaSivamal/Agentic-AFL/llms.txt
Use this file to discover all available pages before exploring further.
StallReport objects from the fuzzer bridge, drives the full Extract → CARM → Generate → Solve → Inject pipeline, and manages the ReAct self-correction loop when Z3 scripts fail. All orchestrator components are async-native: they use asyncio.gather for parallel LLM generation, asyncio.wait_for for sandbox timeouts, and asyncio.PriorityQueue for stall ordering — ensuring the entire pipeline fits inside a single event loop without blocking AFL++.
AgentLoop Lifecycle
AgentLoop is the central coordinator. It is instantiated once per fuzzing campaign, initialized via setup(), and then run as a daemon via run().
setup()
Instantiates all sub-components in dependency order:
SpecStore— connects to PostgreSQL and runs schema migrationsPCodeSlicer,ConstraintProfiler,SpecExporter— the extractor pipelineCARMRetriever,LLMClient,Z3Sandbox— the orchestrator pipelineStallDetector,PayloadInjector— the fuzzer bridge
settings (which reads from .env / environment variables). The afl_output_dir, afl_sync_dir, target_binary, and min_stall_cycles parameters passed to AgentLoop.__init__() override their corresponding settings values.run() — main asyncio loop
Runs indefinitely until
shutdown() is called (by a SIGINT or SIGTERM signal handler registered during run()). Each iteration:- Poll —
StallDetector.detect()is called; any new stalls not already in_active_stallsor_solved_stallsare enqueued in_stall_queueat their severity priority - Dequeue — the highest-priority item is dequeued
- Process —
_process_stall()is called (see below) - Sleep —
asyncio.sleep(settings.stall_poll_interval)before the next poll
_process_stall() — the full pipeline
This is the core logic method. It runs the complete pipeline for a single stall:
- Extract a
PCodeSliceviaPCodeSlicer - Produce a
ConstraintProfileviaConstraintProfiler - Upsert a
VulnerabilitySpecto PostgreSQL viaSpecExporter - Query CARM for historical templates via
CARMRetriever - Run a REDQUEEN-style GDB offset probe to determine where in the file the function’s input pointer starts
- Build a
Z3GenerationRequestand pass it to_generate_and_solve() - If a
SolvedPayloadis returned, inject it viaPayloadInjectorand triggerDiversityGenerator - Update the winning Z3 script as a CARM template via
CARMRetriever.update_template()
_process_stall() returns True, the stall address is added to _solved_stalls and StallDetector.resolve_stall() is called, preventing re-detection on future polls.
CARMRetriever
The Constraint-Aware Retrieval Module (CARM) finds historicalVulnerabilitySpec records whose ConstraintProfile tag sets are most similar to the current stall site. Retrieval is two-stage:
Stage 1 — PostgreSQL server-side Jaccard filter
CARMRetriever.query() converts the current ConstraintProfile’s frozenset[ConstraintTag] to an INTEGER[] array via SpecStore.tags_to_int_array(), then calls SpecStore.query_by_jaccard(). PostgreSQL computes jaccard_similarity(stored_tags, query_tags) server-side using a GIN-indexed INTEGER[] column and returns only rows with score ≥ similarity_threshold (default configurable), up to max_results rows. This is an O(log N) indexed operation — not a full-table scan.
Stage 2 — Python re-ranking
The small result set returned by PostgreSQL (at most max_results rows) is re-ranked in Python using a weighted relevance score:
complexity_similarity penalizes specs whose estimated_complexity differs significantly from the query’s. solve_score rewards specs that have been successfully solved before (capped at 5 solves to prevent runaway bias). The final list is sorted by relevance_score descending.
architecture argument filters results to matching CPU architectures. This prevents ARM32 templates from being suggested for MIPS32 targets, where different register widths would produce incompatible BitVec declarations.
LLMClient and K-way Voting
LLMClient is the translation engine. Its only job is to convert a Z3GenerationRequest (containing a VulnerabilitySpec, seed input, retrieved templates, and correction history) into K candidate Z3Script objects. The LLM is used strictly as a translator — it never reasons about satisfiability directly.
Provider selection is controlled by settings.llm_api_provider:
OpenAIProvider
Uses OpenAI’s
chat.completions.create API. Supports any model accessible via the OpenAI API. Configured via LLM_API_KEY and LLM_MODEL_NAME settings.GeminiProvider
Uses Google’s
genai SDK with aio.models.generate_content. Generates K independent completions via asyncio.gather to maximize candidate variance. Configured via GEMINI_API_KEY and LLM_MODEL_NAME settings.LocalProvider
Stub for future vLLM/Ollama integration. Not implemented in the current release — raises
NotImplementedError. Intended for offline campaigns without API costs.k_vote_count completions in a single provider call (or via asyncio.gather for Gemini, which issues K independent API requests). All K Z3Script objects are then executed in parallel by the sandbox. The first script to return SAT is accepted. If multiple scripts return SAT, the one with the most byte-indexed variables in its model is preferred, ensuring the most complete constraint coverage.
Code extraction uses a three-strategy fallback:
- Standard
```python ... ```regex - Relaxed regex (no newline requirement after language tag)
- Fence stripping with heuristic detection of the first code-like line
Z3Sandbox
Z3Sandbox executes LLM-generated Z3Py scripts in a separate Python subprocess. This isolation is non-negotiable: LLM-generated code is untrusted and must not be executed in the main process.
Subprocess isolation is implemented via asyncio.create_subprocess_exec(sys.executable, script_path). Each script runs in a fresh Python interpreter with no shared state from the main process.
Resource limits are applied inside the subprocess before user code runs:
RLIMIT_AS— virtual address space capped at 512 MBRLIMIT_CPU— CPU time limited by the Z3 solver timeout
from z3 import * is loaded (Z3’s C extension internally uses os, ctypes, etc. during module initialization), a restricted __import__ function is installed that blocks imports of os, subprocess, shutil, pathlib, socket, http, urllib, requests, httpx, multiprocessing, threading, and related modules. RLIMIT_AS and RLIMIT_CPU serve as the true security backstop.
Script sanitization — before wrapping, the sandbox strips any lines from the LLM-generated script that would conflict with the harness: from z3 import *, s = Solver(), s.set("timeout", ...), s.check(), m = s.model(), and model-printing blocks. The harness provides these itself and invokes them after the LLM’s constraint-building code.
Verdict mapping:
| Verdict | Meaning |
|---|---|
SAT | Solver found a satisfying model; result.model is a dict[str, int] |
UNSAT | Constraints are unsatisfiable — no solution exists |
TIMEOUT | s.check() exceeded the timeout budget |
SYNTAX_ERROR | Script failed to parse or compile; subprocess returned non-zero |
RUNTIME_ERROR | Script executed but raised a Python exception |
UNKNOWN | Z3 returned unknown (typically non-linear arithmetic) |
===Z3_RESULT_JSON=== and ===Z3_RESULT_END=== delimiters on stdout, then parsed by the sandbox into a Z3Result dataclass.
ReAct Loop
The_generate_and_solve() method in AgentLoop implements a Reason-Act loop that drives K-way generation and self-repair across up to max_react_turns iterations (default 5).
UNSAT— the script compiled and ran, but the constraints are logically inconsistent; the most structurally informative failureRUNTIME_ERROR— the script executed but hit a Python exception; error message often pinpoints the specific Z3 API misuseSYNTAX_ERROR— the script failed to parse; error message identifies the syntax problemTIMEOUT— Z3 couldn’t solve within the budget; suggests the constraint model may need concretizationUNKNOWN— Z3 returned unknown; typically indicates non-linear arithmetic
min(8, max(input_len // 2, 1)) byte-indexed variables. For a zero-length seed input the threshold is 1. If a model falls below this threshold, a CorrectionEntry is appended describing which byte positions are missing, and the loop continues to the next turn.