Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/AdithyaaSivamal/Agentic-AFL/llms.txt

Use this file to discover all available pages before exploring further.

Agentic-AFL uses PostgreSQL as its persistent store for VulnerabilitySpec objects and Z3 templates. The schema is optimized for Jaccard similarity queries on INTEGER[] constraint tag arrays using the intarray extension and a custom jaccard_similarity() SQL function. All similarity computation happens server-side — Python receives only the pre-ranked top-N results, never the full table.

Why PostgreSQL

The choice of PostgreSQL over a vector database (such as mem0, Qdrant, or Chroma) is a deliberate architectural decision documented in spec_store.py. CARM retrieval requires Jaccard similarity on formal logic tag sets — a set-intersection-over-union operation on small integer arrays. This is fundamentally different from cosine similarity on high-dimensional embedding vectors, which is what vector databases are optimized for.
ApproachJaccard computationMemory usageQuery complexity
mem0 / vector-DBPull ALL records to Python RAM, loop in PythonO(N) RAMO(N)
PostgreSQL + intarrayjaccard_similarity() runs in the database engineConstant in PythonO(log N) with GIN index
A second reason is forward compatibility: the pgvector extension can be added to the same PostgreSQL instance to enable a two-stage retrieval strategy (ConstraintLLM §2.2) — embedding pre-filter to narrow candidates, followed by Jaccard re-ranking — all in a single database query without introducing a separate service.
The intarray extension is required for Jaccard similarity queries. It ships with standard PostgreSQL distributions. Run CREATE EXTENSION IF NOT EXISTS intarray; if it is not already enabled on your instance.

Schema

The full initialization script is mounted into the PostgreSQL container via docker-compose.yml and runs automatically on first startup. It is idempotent — safe to run against an existing database.
init.sql
-- Enable the intarray extension for INTEGER[] operations.
CREATE EXTENSION IF NOT EXISTS intarray;

-- Server-side Jaccard similarity function.
-- J(A,B) = |A ∩ B| / |A ∪ B|
CREATE OR REPLACE FUNCTION jaccard_similarity(a INTEGER[], b INTEGER[])
RETURNS FLOAT AS $$
DECLARE
    intersection_size INTEGER;
    union_size INTEGER;
BEGIN
    SELECT COUNT(*) INTO intersection_size
    FROM (SELECT UNNEST(a) INTERSECT SELECT UNNEST(b)) t;

    SELECT COUNT(*) INTO union_size
    FROM (SELECT UNNEST(a) UNION SELECT UNNEST(b)) t;

    IF union_size = 0 THEN RETURN 0.0; END IF;
    RETURN intersection_size::FLOAT / union_size::FLOAT;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

CREATE TABLE IF NOT EXISTS vulnerability_specs (
    spec_id             TEXT PRIMARY KEY,
    binary_path         TEXT NOT NULL,
    stall_address       TEXT NOT NULL,
    function_name       TEXT NOT NULL,
    architecture        TEXT NOT NULL,
    constraint_tags     INTEGER[] NOT NULL DEFAULT '{}',
    pcode_text          TEXT NOT NULL,
    profile_data        JSONB NOT NULL DEFAULT '{}',
    z3_template         TEXT,
    correction_history  JSONB NOT NULL DEFAULT '[]',
    created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    last_attempted      TIMESTAMPTZ,
    solve_count         INTEGER NOT NULL DEFAULT 0
);

CREATE INDEX IF NOT EXISTS idx_specs_architecture
    ON vulnerability_specs (architecture);
CREATE INDEX IF NOT EXISTS idx_specs_stall_address
    ON vulnerability_specs (stall_address);
CREATE INDEX IF NOT EXISTS idx_specs_tags
    ON vulnerability_specs USING GIN (constraint_tags);

jaccard_similarity(a INTEGER[], b INTEGER[])

The core function that justifies the PostgreSQL architecture. It implements ConstraintLLM Eq. 3 directly in PL/pgSQL:
  • UNNEST(a) INTERSECT UNNEST(b) — set intersection of the two tag arrays
  • UNNEST(a) UNION UNNEST(b) — set union of the two tag arrays
  • Returns 0.0 when both arrays are empty (avoids division by zero)
  • Declared IMMUTABLE — PostgreSQL can inline and cache calls within a single query

Indexes

Three indexes are created to accelerate the common query patterns:
IndexTypePurpose
idx_specs_architectureB-TreeEquality filter on the architecture column; eliminates cross-arch template suggestions
idx_specs_stall_addressB-TreeDirect lookup by hex stall address for SpecStore.load_spec()
idx_specs_tagsGINFast set membership and containment queries on the constraint_tags INTEGER[] column; assists the jaccard_similarity() function

Column Reference

spec_id
TEXT PRIMARY KEY
required
SHA-256 hash (truncated to 16 hex characters) of binary_path + stall_address. Uniquely identifies a stall site within a specific binary. Used by SpecStore.load_spec() and all update operations.
binary_path
TEXT
required
Absolute filesystem path to the analyzed binary (e.g., /targets/firmware.elf). Preserved for provenance — the orchestrator can re-analyze the binary if the spec needs updating.
stall_address
TEXT
required
Hex string address of the AFL++ stall site (e.g., "0x08001234"). Stored as text to preserve leading zeros and architecture-specific formatting. Indexed for direct lookup.
function_name
TEXT
required
Ghidra decompiled function name containing the stall site (e.g., "validate_header", "FUN_08001200"). Used by the LLM prompt to provide human-readable context when no decompiled name is available.
architecture
TEXT
required
Architecture enum value for the target binary (e.g., "arm32", "x86_64"). Used as a mandatory filter in CARM queries to prevent cross-architecture template suggestions — ARM32 templates use BitVec(32) and are incompatible with X86_64 targets.
constraint_tags
INTEGER[]
required
Sorted array of ConstraintTag enum integer values derived from P-Code analysis of the stall site’s backward slice. This is the primary input to the server-side jaccard_similarity() function. Empty array '{}' means no structural tags were detected. The GIN index (idx_specs_tags) accelerates set operations on this column.
pcode_text
TEXT
required
Concatenated P-Code instruction text from the backward slice rooted at the stall site branch. Included verbatim in the LLM prompt so the model can reason about the exact operations performed on the input before the branch.
profile_data
JSONB
required
Full ConstraintProfile metrics serialized as a JSON object. Contains the following fields populated by constraint_profiler.py:
  • bitwise_density — Fraction of slice ops that are bitwise operations (float, 0.0–1.0)
  • arithmetic_density — Fraction of slice ops that are arithmetic operations (float, 0.0–1.0)
  • loop_depth — Maximum loop nesting depth observed in the slice (integer)
  • register_count — Number of distinct registers read in the slice (integer)
  • estimated_complexity — Heuristic complexity score (0–100) used for Stage 2 re-ranking in CARMRetriever._compute_relevance()
z3_template
TEXT
The last successful Z3Py script for this stall site — i.e., the script that produced a SAT verdict and a valid concrete model. NULL for stall sites that have never been solved. Retrieved by CARM and passed to the LLM as a starting-point template for similar stall sites.
correction_history
JSONB
required
JSON array of {error_message, corrected_script, timestamp} objects accumulated across all past self-repair attempts. New entries are appended (not overwritten) via PostgreSQL’s || JSONB concatenation operator, preserving the full repair history. Used to provide the LLM with prior failure context during the self-repair cycle (SAILOR §4.4).
created_at
TIMESTAMPTZ
required
Timestamp (with timezone) when this spec was first inserted. Set server-side to NOW() on insert; not updated on subsequent upserts.
last_attempted
TIMESTAMPTZ
Timestamp of the most recent agent attempt to solve this spec — updated on every upsert and on update_template() and append_correction() calls. NULL for specs that have been stored but not yet attempted. Useful for prioritizing stall sites that have not been re-attempted recently.
solve_count
INTEGER
required
Number of times this spec has been successfully solved (SAT verdict with a valid concrete model). Incremented by SpecStore.update_template(). Used as a quality signal in Stage 2 re-ranking — specs with solve_count > 0 are preferred as template sources because their Z3 scripts have been empirically validated.

CARM Query Pattern

CARMRetriever calls SpecStore.query_by_jaccard(), which executes the following parameterized query. The jaccard_similarity() call and the ORDER BY happen entirely inside PostgreSQL.
SELECT
    spec_id,
    stall_address,
    z3_template,
    constraint_tags,
    profile_data,
    jaccard_similarity(constraint_tags, $1::integer[]) AS score
FROM vulnerability_specs
WHERE architecture = $2
  AND jaccard_similarity(constraint_tags, $1::integer[]) >= $3
ORDER BY score DESC
LIMIT $4;
ParameterTypeDescription
$1INTEGER[]Query tag array for the current stall site (ConstraintTag values)
$2TEXTArchitecture filter (e.g., "arm32")
$3FLOATMinimum Jaccard threshold (default 0.3 from carm_similarity_threshold)
$4INTEGERMaximum rows to return (default 5 from carm_max_results)
After the database returns at most $4 rows, CARMRetriever._compute_relevance() applies a Python-side re-ranking combining Jaccard score (weight 0.6), complexity similarity (weight 0.2), and solve history (weight 0.2) to produce the final relevance_score ordering.
When architecture is None, CARMRetriever omits the AND architecture = $4 clause, falling back to cross-architecture retrieval. This is useful for bootstrapping a new architecture where the spec store has no same-architecture records yet.

Deployment Options

Local Docker (development)

The recommended way to run the database locally. The init.sql file is mounted as a Docker entrypoint initialization script and runs automatically on first startup.
docker run -d --name agentic-afl-db \
  -e POSTGRES_USER=agentic_afl \
  -e POSTGRES_PASSWORD=agentic_afl \
  -e POSTGRES_DB=agentic_afl \
  -p 5432:5432 \
  -v ./database/init.sql:/docker-entrypoint-initdb.d/init.sql \
  postgres:16
The corresponding DATABASE_URL for config.py is:
postgresql://agentic_afl:agentic_afl@localhost:5432/agentic_afl

Managed PostgreSQL (production)

Any PostgreSQL 14+ instance with the intarray extension available is supported. Tested providers include:
  • Supabaseintarray available; connection pooling via PgBouncer recommended for the asyncpg pool
  • Neon — Serverless PostgreSQL; intarray available; branch-per-environment workflow fits the fuzzing campaign model
  • Self-hosted Proxmox LXC/VM — Provisioned via Terraform as documented in infra/
The jaccard_similarity() function must be created after intarray is enabled. If deploying to a managed instance where you do not have SUPERUSER privileges to CREATE EXTENSION, ask your DBA or provider support to enable intarray before running init.sql. The SpecStore.initialize() method silently skips the CREATE EXTENSION step on InsufficientPrivilegeError and proceeds, assuming the extension was pre-installed.

SpecStore Initialization

SpecStore.initialize() is called once at agent startup (agent_loop.setup()). It creates the connection pool, registers the jaccard_similarity() function, and creates the table and indexes — all idempotently via CREATE ... IF NOT EXISTS.
from agentic_afl.database.spec_store import SpecStore

store = SpecStore()
await store.initialize()  # Idempotent — safe to call on every startup
The asyncpg connection pool is configured with min_size=2, max_size=10, suitable for a single-agent fuzzing process. Increase max_size for multi-agent parallel deployments.

Build docs developers (and LLMs) love