Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/microsoft/agent-governance-toolkit/llms.txt

Use this file to discover all available pages before exploring further.

Transparency is a feature. This page documents what AGT does not do so you can make informed architecture decisions. These are architectural design boundaries — not bugs — and each one comes with recommended mitigations and an honest account of what is being built to address the gap. AGT is one layer in a defence-in-depth strategy, not the entire strategy.

1. Action Governance, Not Reasoning Governance

AGT governs what agents do — tool calls, resource access, inter-agent messages. It does not govern what agents think or say. What this means in practice:
  • ✅ AGT blocks an agent from calling delete_file if policy forbids it
  • ❌ AGT does not detect if the content passed to an allowed tool is a hallucination
  • ❌ AGT does not detect indirect prompt injection that corrupts the agent’s reasoning
  • ❌ AGT does not correlate sequences of individually-allowed actions that form a malicious workflow
Example gap: If policy allows both read_database and send_slack_message, an agent could read your customer list and post it to a public channel — both actions are individually permitted.
Cross-session attack chains. The same gap extends across session boundaries when persistent memory or persistent tools (notes, files, calendar) carry attack state between sessions under permission isolation. A backdoored or prompt-injected model can write attack state in one session and resume from that state in a later session whose tool set permits the next phase — so each session’s individual actions remain policy-permitted while the full attack chain only resolves across the session sequence. Dai et al. report 80–95% attack success rates across four base models under supply-chain SFT delivery (preprint, May 2026).
Mitigations available today:
  • Use content policies with blocked patterns (regex) to catch PII in outputs
  • Use PromptDefenseEvaluator (agt red-team scan) to test for prompt injection vulnerabilities
  • Combine AGT with a model-level safety layer like Azure AI Content Safety
  • Use max_tool_calls limits to cap action sequences
What we’re building:
  • Workflow-level policies that evaluate action sequences, including cross-session sequences under persistent memory, not just individual actions
  • Intent declaration where agents declare what they plan to do before doing it, and the policy engine validates the plan

2. Audit Logs Record Attempts, Not Outcomes

AGT’s audit trail records what the agent attempted and whether the governance layer allowed or denied it. It does not verify whether the action actually succeeded in the external world. Example gap: An agent calls a web API that returns 200 OK but the data was stale. AGT logs “action allowed, executed” — but the agent’s goal was not actually achieved. What this means in practice:
  • ✅ AGT provides a tamper-evident, hash-chained record of every governance decision
  • ❌ AGT does not verify post-execution world-state
  • ❌ AGT does not record whether the downstream service fulfilled the request correctly
Mitigations available today:
  • Use the SRE module with SLOs to track action success rates over time
  • Use saga orchestration with compensating actions for multi-step workflows
  • Implement application-level result validation in your agent code
What we’re building:
  • Post-action verification hooks where users register validators that check world-state after action execution
  • Outcome attestation in audit logs (succeeded/failed/unknown)

3. Cross-Session Attack Chains

When agents use persistent memory, file tools, or shared storage, attack state can survive across session boundaries — even when each session’s individual actions are policy-permitted. The cross-session attack pattern works as follows:
  1. In Session A, a prompt-injected or backdoored model writes attack state to a persistent store (memory module, note tool, calendar entry).
  2. Session B, running under a different (or the same) tool set, reads that state and executes the next phase.
  3. Each individual session looks clean to AGT’s per-action policy evaluation.
Dai et al. demonstrate that this attack generalises to alternative topologies (branch-and-merge) and alternative persistent components (note-tool in place of memory), achieving 80–95% attack success rates across four base models (arXiv:2605.06158, preprint May 2026).
AGT’s current policy model is per-action and per-session. There is no cross-session correlation engine. If your agent deployment uses persistent memory or shared tool state between sessions, the cross-session gap applies to you.
Mitigations available today:
  • Restrict which tools can write to persistent stores using AGT policy rules
  • Use separate isolated storage namespaces per session
  • Apply blocked_patterns rules to catch known attack state signatures in tool write arguments
What we’re building:
  • Workflow-level policies spanning multiple sessions and persistent memory components

4. Application-Layer Enforcement (Same Process as the Agent)

AGT enforces governance at the application middleware layer, not at the OS kernel or hardware level. The policy engine and the agent share the same process boundary. What this means in practice:
  • ✅ AGT is deterministic — a denied action is structurally blocked, not just probabilistically discouraged
  • ❌ A sufficiently privileged agent process (or a compromised dependency in the same process) can in principle bypass application-layer controls
  • ❌ AGT does not provide OS-level syscall filtering, seccomp, or kernel namespacing
Example gap: If an agent uses importlib to dynamically import a module that calls an external API directly, bypassing the governed tool interface, AGT does not intercept that call. (AGT’s sandbox does block importlib dynamic imports — but only when the sandbox is used.)
Production recommendation: Run each agent in a separate container for OS-level isolation. Combine AGT (application-layer enforcement) with container security policies (network policies, seccomp profiles, read-only filesystems) for defence in depth. See How It Works for the full architecture and security boundaries.
Mitigations available today:
  • Use execution rings (Ring 3 sandbox) to restrict what the agent process can do
  • Use container isolation — one agent per container with minimal capabilities
  • Use the nono sandbox provider (pip install "agt-sandbox[nono]") for kernel-enforced Landlock/Seatbelt sandboxing on Linux/macOS
  • Use AGT’s command denylist enforcement in RingEnforcer to block dangerous subprocess commands

5. Fail-Closed but Not Tamper-Proof in the Same Process

AGT fails closed on policy evaluation errors (unexpected exceptions cause deny, not allow). However, because the policy engine and the agent share the same Python/Node/.NET process, a sufficiently sophisticated adversary with code execution in that process could tamper with the governance layer itself. What this means in practice:
  • ✅ Runtime errors during policy evaluation → action denied (fail-closed)
  • ✅ Hash-chained audit logs detect post-hoc tampering with the log record
  • ❌ An attacker with code execution in the same process can potentially overwrite in-memory policy state
  • ❌ AGT does not use a hardware TPM or TEE to attest that the governance code itself has not been modified at runtime
AGT’s bootstrap IntegrityVerifier hashes 15 governance module source files and 4 critical function bytecodes against a published integrity.json manifest to detect supply-chain tampering before policy evaluation begins. This provides tamper-detection at startup, not continuous runtime attestation.
Mitigations available today:
  • Use the IntegrityVerifier at startup: agt verify --evidence ./agt-evidence.json checks the integrity manifest
  • Use process isolation (separate containers) so a compromised agent cannot reach the governance process
  • Use the TEE keystore abstraction (available in v4.0.0) for attested key management in supported hardware environments
What we’re building:
  • Deeper TEE integration for hardware-attested governance execution
  • Continuous runtime integrity monitoring

6. Knowledge Governance Gap

AGT governs agent actions (tool calls, resource access, inter-agent messages). It does not govern the knowledge agents consume — documents, databases, embeddings, and context retrieved during reasoning. Example gap: An agent retrieves a confidential HR document via a search tool (which AGT permits via policy), then summarises it in a Slack message (also permitted). Both actions are individually governed, but the knowledge flow — confidential data reaching an unauthorised channel — is invisible to AGT.
Mitigations available today:
  • Use egress policies to restrict which domains agents can send data to
  • Use blocked_patterns to catch PII/confidential patterns in tool arguments
  • Combine AGT with a data classification layer that labels context before it reaches the agent
What we’re building:
  • Integration points for external knowledge governance systems
  • Context provenance tracking in audit logs

7. Credential Persistence Gap

AGT governs what agents do with tools. It does not manage or observe the credentials agents hold across tasks within a session. Example gap: An agent receives an email API token for Task A, then moves to Task B (which doesn’t require email access). The token persists. If the agent is compromised during Task B, the attacker gains email access that should no longer be active.
Mitigations available today:
  • Use scoped capabilities in Agent OS policies to limit which tools are available per task context
  • Use short-lived credentials with external secret managers (Azure Key Vault, HashiCorp Vault) and TTL-based rotation
  • Use trust decay in AgentMesh to reduce trust scores over time
What we’re building:
  • Task-scoped credential lifecycle hooks
  • Automatic credential revocation at context switches

8. Initialisation and Configuration Bypass Risk

AGT’s governance enforcement requires correct initialisation. If the governance middleware is imported but not properly configured, agents may run without effective policy enforcement. What this means in practice:
  • ✅ When properly initialised with policies loaded, AGT enforces all rules before execution
  • ⚠️ If the policy evaluator has no policies loaded, the default action is allow — all actions pass through ungoverned
  • ⚠️ If permissive mode is used without realising it allows all actions, agents run effectively ungoverned
  • ✅ On runtime errors during policy evaluation, AGT fails closed (denies access)
Example gap: A developer imports agent_os and adds it to their agent framework integration, but forgets to load policy files. The governance dashboard shows “governed” status, but no rules are enforced.
Always use strict mode (deny-by-default) in production — this requires explicit allow rules for every permitted action and means a misconfigured or empty policy set blocks everything rather than allowing everything. Verify with agt doctor and agt lint-policy policies/ in your CI pipeline.
Mitigations available today:
  • Use strict mode (deny-by-default) in production environments
  • Use agt audit CLI to verify loaded policies and detect permissive defaults
  • Run agt doctor to check that all components are properly initialised
What we’re building:
  • Startup validation that warns when no policies are loaded
  • Dashboard indicators for effective enforcement state (not just import state)

For production deployments, use a layered defence:
┌─────────────────────────────────┐
│   Model Safety Layer            │  Azure AI Content Safety, Llama Guard
│   (input/output filtering)      │  ← catches hallucinations, toxic content
├─────────────────────────────────┤
│   AGT Governance Layer          │  Policy engine, identity, trust, audit
│   (action enforcement)          │  ← catches unauthorised actions
├─────────────────────────────────┤
│   Application Layer             │  Your agent code, framework adapters
│   (business logic validation)   │  ← catches domain-specific errors
├─────────────────────────────────┤
│   Infrastructure Layer          │  Containers, network policies, IAM
│   (OS/network isolation)        │  ← catches escape attempts
└─────────────────────────────────┘
AGT covers the governance layer. The model safety and infrastructure layers are your responsibility to configure.

What AGT Is and Is Not

AGT IsAGT Is Not
Runtime action governanceModel safety / content moderation
Deterministic policy enforcementProbabilistic guardrails
Application-layer middlewareOS kernel / hardware isolation
Framework-agnostic libraryA managed cloud service
Audit trail of actionsAudit trail of outcomes
Action governanceKnowledge / data provenance governance
Enforcement infrastructureTurnkey compliance solution
If you find a limitation not listed here, please open an issue — the maintainers actively update this page based on external analysis and community feedback.

Build docs developers (and LLMs) love