How AGT Works: Policy Interception, Trust Scoring, and Audit

AGT enforces governance at the application middleware layer using deterministic interception: every agent action is evaluated against policy before execution, at sub-millisecond latency (under 0.1 ms). This is not a probabilistic filter or a model-layer safety prompt — it is code that runs in the same process as your agent framework and either allows the action, denies it, or routes it to a human approver before the intent ever reaches the wire. For high-security environments, AGT composes with container or VM isolation for defense-in-depth, but application-layer interception alone covers the vast majority of production risk surfaces.

Full System Architecture

The following diagram shows the complete AGT component topology, from the policy check at the top to the framework adapters at the bottom:

╔══════════════════════════════════════════════════════════════════════════╗
║                    AGENT GOVERNANCE TOOLKIT  v4.0.0                     ║
║              pip install agent-governance-toolkit[full]                  ║
║                                                                         ║
║  Agent Action ──► POLICY CHECK ──► Allow / Deny    (< 0.1 ms)          ║
║                                                                         ║
║  ┌──────────────────────────┐     ┌──────────────────────────────┐      ║
║  │      AGENT OS ENGINE     │◄───►│          AGENTMESH           │      ║
║  │                          │     │                              │      ║
║  │  ● Policy Engine         │     │  ● Zero-Trust Identity       │      ║
║  │  ● Capability Model      │     │  ● Ed25519 / SPIFFE Certs    │      ║
║  │  ● Governance Gate       │     │  ● Trust Scoring (0-1000)    │      ║
║  │  ● GovernanceEventSink   │     │  ● Wire Protocol (A2A/MCP)   │      ║
║  │  ● Decision BOM          │     │  ● Delegation Chains         │      ║
║  └────────────┬─────────────┘     └───────────────┬──────────────┘      ║
║               │                                   │                     ║
║               ▼                                   ▼                     ║
║  ┌──────────────────────────┐     ┌──────────────────────────────┐      ║
║  │     AGENT RUNTIME        │     │         AGENT SRE            │      ║
║  │                          │     │                              │      ║
║  │  ● Execution Rings (0-3) │     │  ● SLO Engine + Error Budgets│      ║
║  │  ● Resource Limits       │     │  ● Replay & Chaos Testing    │      ║
║  │  ● Runtime Sandboxing    │     │  ● Progressive Delivery      │      ║
║  │  ● Termination Control   │     │  ● Circuit Breakers          │      ║
║  └──────────────────────────┘     └──────────────────────────────┘      ║
║                                                                         ║
║  ┌──────────────────────────┐     ┌──────────────────────────────┐      ║
║  │    AGENT HYPERVISOR      │     │      AGENT LIGHTNING         │      ║
║  │                          │     │                              │      ║
║  │  ● Execution Audit       │     │  ● RL Training Governance    │      ║
║  │  ● Delta Engine          │     │  ● Violation Penalties       │      ║
║  │  ● Commitment Anchoring  │     │  ● Reward Shaping            │      ║
║  │  ● Merkle Chain Logs     │     │  ● Training Checkpoints      │      ║
║  └──────────────────────────┘     └──────────────────────────────┘      ║
║                                                                         ║
║  ┌──────────────────────────┐     ┌──────────────────────────────┐      ║
║  │   AGENT MARKETPLACE      │     │   MCP SECURITY GATEWAY       │      ║
║  │                          │     │                              │      ║
║  │  ● Plugin Discovery      │     │  ● Tool-Call Policy Checks   │      ║
║  │  ● Signing & Verification│     │  ● Trust Verification        │      ║
║  │  ● Trust Scoring         │     │  ● Rate Limiting             │      ║
║  └──────────────────────────┘     └──────────────────────────────┘      ║
║                                                                         ║
║  ┌──────────────────────────────────────────────────────────────┐       ║
║  │              FRAMEWORK ADAPTERS                              │       ║
║  │  LangChain · CrewAI · AutoGen · OpenAI · ADK · smolagents   │       ║
║  └──────────────────────────────────────────────────────────────┘       ║
║                                                                         ║
╚══════════════════════════════════════════════════════════════════════════╝

Component Deep Dive

Agent OS Engine

The Policy Engine at the core of AGT. Evaluates every agent action against YAML, OPA/Rego, or Cedar rules before execution. Includes the Capability Model (what an agent is allowed to do), the Governance Gate (the hard stop in the execution path), the GovernanceEventSink (structured event emission), and the Decision Bill of Materials (tamper-evident record of every allow/deny decision).

AgentMesh

The zero-trust identity and routing layer. Issues each agent a cryptographic credential (Ed25519 key pair, SPIFFE certificate, or DID document), maintains a 0–1000 trust score per agent, and manages delegation chains for multi-agent systems. Wire protocol supports A2A, MCP, and IATP. When something goes wrong in a multi-agent system, AgentMesh tells you exactly which agent acted.

Agent Runtime

Execution sandboxing using four privilege rings (0–3), modeled after OS privilege levels. Ring 0 is the most privileged (system operations); Ring 3 is the least (untrusted plugins). Each ring has configurable resource limits, and actions that violate ring permissions raise a GovernanceDenied before execution. Includes saga orchestration for multi-step workflows and termination control.

Agent SRE

Site reliability engineering for agents. Tracks SLOs (error rate, latency, compliance rate) and error budgets, provides deterministic replay for incident debugging, supports chaos engineering to validate governance holds under fault injection, and implements circuit breakers to stop runaway agents automatically.

Agent Hypervisor

Execution audit and commitment anchoring. Records every state transition using a delta engine (only the diff is stored), anchors commitments to a Merkle chain for tamper-evidence, and enforces a command denylist at the kernel level. The Merkle chain logs give auditors a cryptographic proof of the complete agent execution history.

Agent Lightning

Governance for reinforcement learning training. Applies violation penalties to the reward signal when an agent proposes a policy-violating action during training — shaping the agent’s learned behavior away from harmful strategies before it ever sees production. Includes training checkpoint governance and reward shaping primitives.

MCP Security Gateway

Tool-call-level security for the Model Context Protocol. Scans MCP tool definitions for tool poisoning, typosquatting, hidden instructions (invisible Unicode, homoglyphs), and rug-pull patterns. Applies policy checks and rate limiting to every tool invocation routed through an MCP server. Operates as a transparent proxy — no changes to your MCP server implementation required.

Agent Marketplace

Plugin governance and trust scoring. Manages the discovery, signing, verification, and trust rating of third-party agent plugins. Every plugin installed from the marketplace has a verified signature and a trust score. Plugins from unverified publishers are blocked by default.

The Execution Flow in Detail

When an agent calls a tool, the request travels through the following layers in order:

Agent ──► Policy Engine ──► Identity ──► Audit Log
            (YAML/OPA/Cedar)  (SPIFFE/DID/mTLS)  (Tamper-evident)
                 │                                      │
                 ├── Allowed ──► Tool executes           │
                 └── Denied  ──► GovernanceDenied        │
                                                        ▼
                                                 Decision Record

Policy Engine evaluates the action context (tool name, parameters, calling agent ID, timestamp) against all active rules. The first matching rule’s effect applies. If no rule matches, the default_action applies. The entire evaluation completes in under 0.1 ms.
Identity check verifies the calling agent’s cryptographic credential and current trust score. Actions from agents below the required trust tier for a given rule are denied.
Audit Log writes a structured decision record — allowed or denied, which rule matched, the full action context, and the policy document version — to an append-only log. The log is Merkle-chained for tamper-evidence.
Tool executes (if allowed) or GovernanceDenied is raised (if denied). The exception propagates up to the agent framework’s error handler.

Every layer is independent and optional. The vast majority of production deployments use the Policy Engine and Audit Log; the Identity, Runtime, SRE, Hypervisor, and Lightning layers are added incrementally as risk requirements grow.

Trust Score Algorithm

AgentMesh assigns every agent a trust score on a 0–1000 scale. The score governs which privilege tiers an agent can access and which policy rules apply based on trust level.

Score Range	Tier	Meaning
900–1000	Verified Partner	Cryptographically verified, long-term trusted
700–899	Trusted	Established track record, elevated privileges
500–699	Standard	Default for new agents with valid identity
300–499	Probationary	Limited privileges, under observation
0–299	Untrusted	Restricted to read-only or blocked entirely

New agents start at 500 (Standard tier). Scores change based on:

Policy compliance history — consistent rule adherence increases score
Successful task completions — verified, non-violating completions add positive weight
Trust boundary violations — any governance denial decreases score and may trigger probationary status

Score changes are logged in the audit trail with the reason for each delta. Full algorithm documentation lives in agent-governance-python/agent-mesh/docs/TRUST-SCORING.md.

Security Model

AGT enforces governance at the application middleware layer, not at the OS kernel level. The policy engine and the agent share the same process boundary — which is the same trust boundary used by every Python-based agent framework (LangChain, AutoGen, CrewAI, OpenAI Agents SDK). This is a deliberate design choice: it means AGT works without any special OS privileges, can be added to any existing agent in two lines, and integrates natively with all framework lifecycle hooks. The security model is honest about what this boundary provides and what it does not:

Enforcement Capability	Defense-in-Depth Composition
Intercepts and evaluates every agent action before execution	Add container isolation (Docker, gVisor, Kata) for OS-level separation
Enforces capability-based least-privilege policies	Add network policies for cross-agent communication control
Provides cryptographic agent identity (Ed25519)	Add external PKI for certificate lifecycle management
Maintains append-only audit logs with Merkle chains	Add external append-only sink (Azure Monitor, write-once storage) for tamper-evidence
Terminates non-compliant agents via signal system	Add OS-level `process.kill()` for isolated agent processes
Governance gate blocks actions before execution (fail-closed)	Add MCP Security Gateway for tool-call-level interception

AGT is not an OS-level sandbox. A compromised Python process could, in principle, bypass application-layer controls. For high-security deployments, combine AGT with container isolation.

Production recommendation: For high-security deployments, run each agent in a separate container with the AGT governance middleware inside. This gives you both application-level policy enforcement and OS-level isolation. See the Architecture: Security Boundaries documentation for detailed guidance.

Formal Specifications

Every major AGT component is backed by an RFC 2119 formal specification with conformance tests. The current suite covers 992 conformance tests across 9 specifications:

Specification	Scope	Tests
Agent OS Policy Engine	Policy evaluation, rule merging, fail-closed semantics	68
AgentMesh Identity and Trust	Credentials, trust scoring, delegation chains	135
Agent Hypervisor Execution Control	Privilege rings, saga orchestration, kill switch	80
AgentMesh Trust and Coordination	Peer trust negotiation, mesh-wide policy	62
Agent SRE Governance	SLOs, error budgets, chaos, circuit breakers	111
MCP Security Gateway	Tool poisoning, drift detection, hidden instructions	127
Agent Lightning Fast-Path	RL training governance, violation penalties	100
Framework Adapter Contract	10 adapter integrations, interceptor chain	152
Audit and Compliance	Merkle audit, compliance mapping, Decision BOM	157

Design rationale for architectural decisions is documented in 29 Architecture Decision Records.

Get Started

Core Concepts

Guides

Compliance

Reference

How AGT Works: Policy Interception, Trust Scoring, and Audit

Full System Architecture

Component Deep Dive

Agent OS Engine

AgentMesh

Agent Runtime

Agent SRE

Agent Hypervisor

Agent Lightning

MCP Security Gateway

Agent Marketplace

The Execution Flow in Detail

Trust Score Algorithm

Security Model

Formal Specifications

Next Steps

Quickstart

Installation

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Compliance

Reference

Documentation Index

​Full System Architecture

​Component Deep Dive

Agent OS Engine

AgentMesh

Agent Runtime

Agent SRE

Agent Hypervisor

Agent Lightning

MCP Security Gateway

Agent Marketplace

​The Execution Flow in Detail

​Trust Score Algorithm

​Security Model

​Formal Specifications

​Next Steps

Quickstart

Installation

Build docs developers (and LLMs) love

Full System Architecture

Component Deep Dive

The Execution Flow in Detail

Trust Score Algorithm

Security Model

Formal Specifications

Next Steps