Hybrid execution: deterministic and LLM-assisted

SAW’s threat detection layer operates in one of two modes controlled by the ASA_MODE environment variable. In HYBRID mode (the default), every log is first evaluated against a set of fast string-matching heuristics. If a heuristic fires, detection completes in microseconds with high confidence and no external API call. If no heuristic matches, the system escalates the log to a Gemini model via the google-genai SDK for deeper contextual analysis. In SAFE mode, LLM calls are disabled entirely and the pipeline runs deterministic heuristics only, returning "type": "None" for anything that doesn’t match a pattern.

Deterministic pipeline
LLM-assisted pipeline

How heuristic detection works

detect_threat() in threat_detector.py calls heuristic_detect() first. If the result is non-null, it is returned immediately without touching the LLM.heuristic_detect() runs canonicalize_signal() on the input before matching. Canonicalization URL-decodes the string, lowercases it, and strips common obfuscation tokens (/**/, %2f%2a%2a%2f, tab/newline characters) to prevent simple bypasses.The four pattern groups and their outputs:SQL Injection

if any(pattern in s for pattern in [
    "or '1'='1", "or 1=1", "or1=1",
    "union select", "unionselect",
    "drop table", "'--",
    "sleep(", "benchmark(", "xp_cmdshell",
]):
    return {
        "type": "SQL Injection",
        "confidence": 0.95,
        "severity": "HIGH",
        "detection_mode": "deterministic",
        "reason_source": "heuristic_rule",
        "reason": "Detected SQL injection pattern",
    }

XSS

if "<script>" in s or "javascript:" in s or "onerror=" in s or "onload=" in s:
    return {
        "type": "XSS",
        "confidence": 0.90,
        "severity": "HIGH",
        "detection_mode": "deterministic",
        "reason_source": "heuristic_rule",
        "reason": "Detected script injection pattern",
    }

Path Traversal

if "../" in s or "..\\" in s:
    return {
        "type": "Path Traversal",
        "confidence": 0.92,
        "severity": "HIGH",
        "detection_mode": "deterministic",
        "reason_source": "heuristic_rule",
        "reason": "Detected path traversal pattern",
    }

Brute Force

if "login failed" in s or "invalid password" in s:
    return {
        "type": "Brute Force",
        "confidence": 0.85,
        "severity": "MEDIUM",
        "detection_mode": "deterministic",
        "reason_source": "heuristic_rule",
        "reason": "Repeated login failure pattern",
    }

A successful heuristic match sets detection_mode = "deterministic" and reason_source = "heuristic_rule". When validate_schema() later calls calibrate_confidence(), deterministic results with reason_source = "heuristic_rule" are mapped to VERY_HIGH (confidence ≥ 0.9) or HIGH confidence buckets, which means RiskAgent will not request an ADK advisory and the deterministic pipeline remains fully authoritative.

You can disable heuristics entirely by setting ASA_ENABLE_HEURISTICS=false. Every log will then go straight to the LLM path (or return a fallback if ASA_MODE=SAFE).

When LLM analysis runs

When heuristic_detect() returns None and the system is in HYBRID mode with a valid Gemini API key, detect_threat() constructs a structured prompt and calls the google-genai SDK:

prompt = f"""
Analyze this security log and classify threat.
Return JSON:
{{
  "type": "...",
  "confidence": 0-1,
  "severity": "LOW|MEDIUM|HIGH",
  "reason": "..."
}}

Log: {signal}
"""

with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
    for attempt in range(1, LLM_MAX_ATTEMPTS + 1):
        future = executor.submit(_generate_with_genai, prompt)
        response = future.result(timeout=5.0)

The call runs in a ThreadPoolExecutor with a 5-second timeout per attempt. LLM_MAX_ATTEMPTS defaults to 2 (configurable via ASA_LLM_MAX_ATTEMPTS). If all attempts fail or time out, the system returns a "fallback" result with confidence = 0.3.

LLM output validation

Raw LLM output passes through validate_llm_output() before being used:

def validate_llm_output(obj: dict) -> dict:
    threat_type = obj.get("type", "Unknown")
    confidence = float(obj.get("confidence", 0.3))
    severity = obj.get("severity", "LOW")

    if threat_type not in ALLOWED_TYPES:
        threat_type = "Unknown"
    if severity not in ALLOWED_SEVERITY:
        severity = "LOW"

    confidence = max(0.0, min(confidence, 1.0))
    return {"type": threat_type, "confidence": confidence,
            "severity": severity, "reason": obj.get("reason", "LLM classification")}

ALLOWED_TYPES is ["SQL Injection", "XSS", "Brute Force", "Path Traversal", "Unknown", "None"]. Any type not in this list is replaced with "Unknown". Confidence is clamped to [0.0, 1.0].A successful LLM classification sets detection_mode = "llm-assisted" and reason_source = "llm_untrusted". The response also carries prompt_version, model_name, and llm_attempt for traceability.

Confidence calibration

After detection (deterministic or LLM), validate_schema() calls calibrate_confidence() to map raw confidence to a discrete bucket with a fixed calibrated value:

Bucket	Calibrated value	When assigned
`VERY_HIGH`	`0.9`	Deterministic + `reason_source = "heuristic_rule"` + raw confidence ≥ 0.9
`HIGH`	`0.78`	Deterministic + heuristic rule (raw < 0.9), or LLM-assisted with raw ≥ 0.8
`MEDIUM`	`0.58`	LLM-assisted with raw ≥ 0.55
`LOW`	`0.35`	LLM-assisted with raw < 0.55, or fallback mode

CONFIDENCE_BUCKETS = {
    "VERY_HIGH": 0.9,
    "HIGH": 0.78,
    "MEDIUM": 0.58,
    "LOW": 0.35,
}

The bucket drives the ADK advisory gate in RiskAgent: only LOW and MEDIUM buckets are eligible for an LLM override of the deterministic decision.

Confidence values in SAW represent relative rank, not probability. The field confidence_semantics is always set to "relative_rank_not_probability" in the response. Do not interpret 0.9 as “90% probability of a true threat.”

ADK advisory: Gemini-backed decision review

Even after detect_threat() completes, RiskAgent can trigger a second, independent LLM call when the confidence bucket is LOW or MEDIUM. This call goes through the ASAAgent class, which wraps a Google ADK Runner with a CoordinatorAgent root agent:

if context.classification.get("confidence_bucket") in {"LOW", "MEDIUM"} \
        or context.classification.get("detection_mode") == "fallback":
    decision["adk_review"] = await self._review_with_adk(context, decision)

The ADK agent receives the full incident context — analysis, classification, and the current deterministic decision — and is instructed to return a JSON object with recommended_decision, reason, and an optional follow_up_task. The system only applies the recommendation if recommended_decision is one of EXECUTE, OBSERVE, or IGNORE; any other output is ignored. You can disable the ADK advisory entirely by setting ASA_ENABLE_ADK_ADVISORY=false.

Response caching

ASAAgent caches every ADK response in memory to avoid redundant LLM calls for identical inputs:

ADK_CACHE_TTL_SECONDS = int(os.getenv("ASA_ADK_CACHE_TTL_SECONDS", "120"))

The cache key is a deterministic JSON hash of the prompt string and a cache_context dict (which includes surface, incident_id, and confidence_bucket). Cached responses are returned immediately with cache_hit: true. On an ADK error with a retry-after hint, the error result is cached for min(ADK_CACHE_TTL_SECONDS, retry_after_seconds) to prevent hammering a rate-limited API. You can tune the TTL by setting ASA_ADK_CACHE_TTL_SECONDS in your .env.

Mode summary

Condition	Detection mode	`confidence_bucket`	ADK advisory
Heuristic match (confidence ≥ 0.9)	`deterministic`	`VERY_HIGH`	Skipped
Heuristic match (confidence < 0.9)	`deterministic`	`HIGH`	Skipped
No heuristic match, LLM raw ≥ 0.8	`llm-assisted`	`HIGH`	Skipped
No heuristic match, LLM raw ≥ 0.55	`llm-assisted`	`MEDIUM`	Eligible
No heuristic match, LLM raw < 0.55	`llm-assisted`	`LOW`	Eligible
LLM failure or `ASA_MODE=SAFE`	`fallback` / `deterministic`	`LOW`	Eligible (if fallback)

Get Started

Architecture

Configuration

Guides

How heuristic detection works

When LLM analysis runs

LLM output validation

Confidence calibration

ADK advisory: Gemini-backed decision review

Response caching

Mode summary

Build docs developers (and LLMs) love

Get Started

Architecture

Configuration

Guides

Documentation Index

​How heuristic detection works

​When LLM analysis runs

​LLM output validation

​Confidence calibration

​ADK advisory: Gemini-backed decision review

​Response caching

​Mode summary

Build docs developers (and LLMs) love

How heuristic detection works

When LLM analysis runs

LLM output validation

Confidence calibration

ADK advisory: Gemini-backed decision review

Response caching

Mode summary