Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/AdithyaaSivamal/Agentic-AFL/llms.txt

Use this file to discover all available pages before exploring further.

Z3Sandbox executes LLM-generated Z3Py scripts in an isolated subprocess with strict timeouts and resource limits. LLM-generated code is untrusted; the sandbox enforces both time and memory budgets and intercepts dangerous imports before any user code runs. All K voting candidates from LLMClient are executed in parallel via asyncio.gather, making the sandbox the primary throughput bottleneck of the pipeline.

Import

from agentic_afl.orchestrator.z3_sandbox import Z3Sandbox

Security Model

Every script execution goes through the following protection layers in order:
  1. Subprocess isolation — Scripts run in a separate Python subprocess (asyncio.create_subprocess_exec). A crash, infinite loop, or sys.exit() inside the script cannot affect the main Agentic-AFL process.
  2. RLIMIT_AS (512 MB memory cap) — Set via resource.setrlimit(resource.RLIMIT_AS, ...) inside the subprocess before any user code executes. A Z3 path explosion that allocates more than 512 MB kills the subprocess with MemoryError.
  3. RLIMIT_CPU (CPU time limit) — The subprocess’s CPU time is bounded. This acts as a backstop if the asyncio timeout fires before the OS scheduler has a chance to kill the process.
  4. Import whitelist — A _restricted_import hook is installed on __builtins__.__import__ after z3 has finished loading (Z3’s C extension loads os, ctypes, etc. internally at import time, so the hook is installed post-import to avoid breaking Z3 itself). Blocked modules include os, subprocess, socket, http, urllib, requests, threading, and multiprocessing.
  5. Strict solver timeout — The execution harness passes s.set("timeout", <ms>) to the Z3 Solver instance before any LLM-generated constraints are added. This bounds s.check() to z3_timeout_seconds (default 5 s).
The import whitelist is a first-line defense. RLIMIT_AS and RLIMIT_CPU are the true security backstop. Do not run Agentic-AFL with root privileges, as resource limits may not be enforced correctly in some containerized environments.

Constructor

Z3Sandbox is constructed with no required arguments. All parameters have sensible defaults sourced from settings.
timeout_seconds
int
default:"settings.z3_timeout_seconds (5)"
Wall-clock seconds before the subprocess is killed with asyncio.TimeoutError. An additional 5-second overhead buffer is added for subprocess startup, so the actual asyncio.wait_for timeout is timeout_seconds + 5.
sandbox_dir
Path
default:"settings.z3_sandbox_dir"
Directory where temporary script files are written before execution. Each script is given a random UUID-based filename (z3_exec_<uuid8>.py) and deleted after execution unless keep_scripts=True.
keep_scripts
bool
default:"false"
When True, temporary script files are not deleted after execution. Useful for post-mortem debugging — combine with settings.debug_mode which also writes raw LLM completions and sandbox results to /tmp/agentic_afl_debug/.

async execute(script: Z3Script) -> Z3Result

Execute a Z3 script in an isolated subprocess. This is the only public method of Z3Sandbox. Steps performed:
  1. Wrap the raw LLM script in the execution harness template (strip duplicate from z3 import *, s = Solver(), s.check(), and model extraction boilerplate that the harness already provides)
  2. Write the wrapped script to a temporary file in sandbox_dir
  3. Execute the file with asyncio.create_subprocess_exec(sys.executable, script_path)
  4. Wait up to timeout_seconds + 5 seconds for the subprocess to finish
  5. Parse the JSON printed between ===Z3_RESULT_JSON=== and ===Z3_RESULT_END=== delimiters
  6. Construct and return a Z3Result
  7. Delete the temporary file (unless keep_scripts=True)
If the subprocess exits without printing the JSON delimiters and returns a non-zero exit code, the result is classified as SYNTAX_ERROR. If the subprocess exits without the delimiters but with exit code 0, the result is classified as RUNTIME_ERROR.

Z3Verdict values

VerdictEnum valueMeaning
SAT"sat"Z3 found a satisfying assignment — model dict is populated
UNSAT"unsat"Z3 proved no solution exists — constraints are structurally wrong
TIMEOUT"timeout"s.check() exceeded timeout_seconds — constraints may need concretization
SYNTAX_ERROR"syntax_error"Script failed to parse/compile before _execute() ran
RUNTIME_ERROR"runtime_error"Script executed but raised a Python exception inside _execute()
UNKNOWN"unknown"Z3 returned unknown (e.g., non-linear arithmetic beyond Z3’s decision procedures)

Z3 Model Format

When verdict is SAT, the model field is a dict[str, int] mapping Z3 variable names to concrete integer values. The naming convention enforced by LLMClient’s system prompt ensures variables are named byte_0, byte_1, etc., one per input byte position.
# Example model from a successful solve of a DNP3 checksum constraint:
{
    "byte_0": 5,    # first byte of input
    "byte_1": 1,    # second byte
    "byte_2": 0,    # third byte
    "byte_3": 1,    # fourth byte
    "byte_4": 100,  # CRC low byte
    "byte_5": 42,   # CRC high byte
}
AgentLoop._model_to_payload() reads this dict and overlays the solved values onto the original seed input. Positions not covered by the model retain their original seed bytes, allowing the solver to address only the constrained header while leaving the packet body intact.
Scripts that print nothing to stdout (due to early exit, uncaught exception before the _execute() wrapper, or a bare sys.exit()) are classified as RUNTIME_ERROR. The error message in Z3Result.error_message is capped at 2000 characters and is fed back to the LLM as the correction prompt in the next ReAct turn.

Parallel Execution

The sandbox is designed for parallel execution. AgentLoop runs all K Z3Sandbox.execute() calls concurrently with asyncio.gather, so total latency per ReAct turn is approximately the latency of the slowest of the K scripts rather than K times the average latency.
import asyncio
from agentic_afl.orchestrator.z3_sandbox import Z3Sandbox

sandbox = Z3Sandbox()
results = await asyncio.gather(*[
    sandbox.execute(script) for script in scripts
])
sat_results = [r for r in results if r.verdict.value == "sat"]
Each execute() call writes its own uniquely named temp file, so concurrent executions never collide on the filesystem. The subprocess isolation also means that a timeout or crash in one candidate does not affect the others.
In settings.debug_mode, AgentLoop writes each script and its result to /tmp/agentic_afl_debug/ as turn{N}_k{M}_{verdict}.py and turn{N}_k{M}_{verdict}.txt. Enable this to inspect exactly what the LLM generated and what the sandbox returned for each voting candidate.

Build docs developers (and LLMs) love