Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/microsoft/agent-governance-toolkit/llms.txt

Use this file to discover all available pages before exploring further.

PolicyEvaluator is the core governance engine of the Agent Governance Toolkit. It loads declarative YAML policies, evaluates every agent action against them, and returns a structured PolicyDecision before the action is ever executed. Every tool call, delegation request, or message send can be intercepted and checked deterministically — actions the engine denies become structurally impossible. The evaluator sits at the centre of the governance pipeline:
Agent action ──► PolicyEvaluator.evaluate(context)

                  YAML rules (priority-sorted)
                  External backends (OPA / Cedar)
                  Default action (fail-closed deny)

                 PolicyDecision { allowed, action, reason, audit_entry }

Installation

pip install agent-governance-toolkit-core

Import

from agent_os.policies import (
    PolicyEvaluator,
    PolicyDocument,
    PolicyDecision,
    PolicyRule,
    PolicyCondition,
    PolicyDefaults,
    PolicyAction,
    PolicyOperator,
)

Constructor

PolicyEvaluator(policies=None, root_dir=None)
policies
list[PolicyDocument] | None
default:"None"
An optional list of already-loaded PolicyDocument objects to seed the evaluator. When None, the policy list starts empty and policies can be added with load_policies().
root_dir
str | Path | None
default:"None"
Optional root directory for folder-scoped policy discovery. When set and the evaluation context contains a path key, the evaluator walks governance.yaml files from the action path up to this root and merges them hierarchically. Leave None for flat policy evaluation (the common case).

Methods

load_policies()

evaluator.load_policies(directory: str | Path) -> None
Loads every .yaml and .yml file in directory as a PolicyDocument and appends them to the evaluator’s internal policy list. Can be called multiple times to load from multiple directories — all rules across all documents are merged and evaluated together, sorted by priority (descending).
directory
str | Path
required
Path to a directory containing YAML policy files. Files are loaded in sorted order.
evaluator = PolicyEvaluator()
evaluator.load_policies("./policies/global/")
evaluator.load_policies("./policies/team-specific/")
# All rules from all directories are now active

evaluate()

evaluator.evaluate(
    context: dict[str, Any],
    dynamic_context: dict[str, Any] | None = None,
) -> PolicyDecision
Evaluates all loaded policy rules against context. Rules are sorted by priority (highest first); the first matching rule determines the decision. If no YAML rule matches, registered external backends are consulted in registration order. If nothing matches, the default action from the first loaded policy is applied (or a global deny if no policies are loaded — fail-closed).
context
dict[str, Any]
required
A flat dictionary of action-level properties. Common keys include tool_name, token_count, confidence, and message. Keys must match the field values in your policy condition blocks.
context = {
    "tool_name": "execute_code",
    "token_count": 1500,
    "confidence": 0.92,
}
dynamic_context
dict[str, Any] | None
default:"None"
Optional runtime context for v1 dynamic conditions (time-window, day-of-week, token budget, cost budget). Existing callers that omit this argument are unaffected. Structure:
dynamic_context = {
    "budget": {
        "token_count": 3500,
        "cost": 0.05,
    }
}
Returns a PolicyDecision (see PolicyDecision fields below). Error behaviour: All exceptions inside the evaluator are caught and converted to a fail-closed PolicyDecision(allowed=False, action="deny"). The engine never raises.

add_backend()

evaluator.add_backend(backend: Any) -> None
Registers an external policy backend. Backends are consulted in registration order only after all loaded YAML rules have been checked without a match. Each backend must implement evaluate(context) -> BackendDecision and expose a name property.
backend
Any
required
An ExternalPolicyBackend implementation such as OPABackend or CedarBackend from agent_os.policies.backends. When a backend’s evaluate() returns an error, the evaluator denies access immediately (fail-closed) without consulting the next backend.

load_rego()

evaluator.load_rego(
    rego_path: str | None = None,
    rego_content: str | None = None,
    package: str = "agentos",
    mode: str = "local",
) -> OPABackend
Convenience method — creates and registers an OPABackend in a single call.
rego_path
str | None
default:"None"
Path to a .rego file.
rego_content
str | None
default:"None"
Inline Rego policy string (use instead of rego_path).
package
str
default:"agentos"
Rego package name for query construction.
mode
str
default:"local"
Evaluation mode: "local", "remote", or "builtin".
Returns the registered OPABackend instance.

load_cedar()

evaluator.load_cedar(
    policy_path: str | None = None,
    policy_content: str | None = None,
    entities: list[dict[str, Any]] | None = None,
    mode: str = "auto",
) -> CedarBackend
Convenience method — creates and registers a CedarBackend in a single call.
policy_path
str | None
default:"None"
Path to a .cedar policy file.
policy_content
str | None
default:"None"
Inline Cedar policy string.
entities
list[dict[str, Any]] | None
default:"None"
Cedar entities for authorization context.
mode
str
default:"auto"
Evaluation mode: "auto", "cedarpy", "cli", or "builtin".
Returns the registered CedarBackend instance.

PolicyDecision Fields

Every call to evaluate() returns a PolicyDecision (a Pydantic BaseModel).
allowed
bool
default:"True"
True if the action is permitted. False if it was denied or blocked.
matched_rule
str | None
default:"None"
Name of the policy rule that fired. None when the default action was applied or an external backend responded.
action
str
default:"allow"
The action taken: "allow", "deny", "audit", or "block".
reason
str
Human-readable explanation of the decision. Comes from the matched rule’s message field, the backend’s reason, or "No rules matched; default action applied".
audit_entry
dict
Structured audit data automatically attached to every decision.
metadata
dict
Structured adaptation hints populated by dynamic-context rules. Empty for standard static rules.

PolicyDocument

PolicyDocument is a Pydantic BaseModel that represents the top-level structure of a YAML policy file.
version
str
default:"1.0"
Schema version string.
name
str
default:"unnamed"
Human-readable policy identifier.
description
str
default:""
Free-text description of what the policy enforces.
rules
list[PolicyRule]
default:"[]"
Ordered list of PolicyRule objects. Evaluated by priority (descending).
defaults
PolicyDefaults
Default action and budget limits applied when no rule matches.
inherit
bool
default:"True"
When using folder-scoped evaluation, setting False stops loading parent governance.yaml files.
scope
str | None
default:"None"
Glob pattern — the policy only applies when the action path in context matches this pattern.
network_allowlist
list[str]
default:"[]"
Host patterns the sandbox may reach (e.g. "pypi.org", "*.github.com"). Combined with defaults.network_default to form the sandbox egress policy. Consumed by sandbox providers; ignored by the rule engine.
tool_allowlist
list[str]
default:"[]"
Tool names the agent may invoke. Enforced host-side by PolicyEvaluator before any sandbox call.

Class methods

# Load from YAML file (requires pyyaml)
doc = PolicyDocument.from_yaml("policies/production.yaml")

# Load from JSON file
doc = PolicyDocument.from_json("policies/production.json")

# Serialize back
doc.to_yaml("policies/production.yaml")
doc.to_json("policies/production.json")

PolicyRule Fields

name
str
required
Unique rule identifier within the policy document.
condition
PolicyCondition
required
The condition evaluated against the context dictionary.
action
PolicyAction
required
The action to take when the condition matches.
priority
int
default:"0"
Rules with higher values are evaluated first. Two rules with the same priority are evaluated in document order.
message
str
default:""
Human-readable explanation returned in PolicyDecision.reason.
dynamic_condition
DynamicCondition | None
default:"None"
Optional runtime condition (time window, day-of-week, token budget, cost budget). Evaluated alongside the static condition.
override
bool
default:"False"
If True, replaces a parent rule with the same name during folder-level policy merging.

PolicyCondition Fields

field
str
required
The key to look up in the evaluation context dictionary. Examples: "tool_name", "token_count", "confidence".
operator
PolicyOperator
required
The comparison operator. See PolicyOperator below.
value
Any
required
The right-hand side of the comparison. Type must be compatible with the operator (e.g., a list for in, a number for gt).

PolicyDefaults Fields

action
PolicyAction
default:"deny"
Fallback action when no rule matches. Defaults to deny (fail-closed). Set to allow explicitly to opt into a permissive posture.
max_tokens
int
default:"4096"
Maximum token count per request evaluated by the rule engine.
max_tool_calls
int
default:"10"
Maximum tool invocations per request evaluated by the rule engine.
confidence_threshold
float
default:"0.8"
Minimum confidence score [0.0–1.0] evaluated by the rule engine.
max_cpu
float | None
default:"None"
Sandbox CPU limit in vCPUs (e.g. 0.5, 1.0). None = provider default. Consumed by sandbox providers; ignored by the rule engine.
max_memory_mb
int | None
default:"None"
Sandbox memory limit in MiB. None = provider default. Consumed by sandbox providers.
timeout_seconds
int | None
default:"None"
Per-execute wall-clock cap in seconds. None = provider default.
network_default
str
default:"deny"
Default sandbox egress action when a host is not on network_allowlist. "deny" is fail-closed and is the default. Set to "allow" only for trusted dev/research workloads.

PolicyAction Enum

ValueallowedDescription
ALLOW / "allow"TruePermit the request.
DENY / "deny"FalseReject the request.
AUDIT / "audit"TruePermit but write an audit entry.
BLOCK / "block"FalseHard block; the reason is surfaced to the caller.

PolicyOperator Values

Enum memberYAML stringBehaviour
EQ"eq"Exact equality
NE"ne"Not equal
GT"gt"Greater than
LT"lt"Less than
GTE"gte"Greater than or equal
LTE"lte"Less than or equal
IN"in"Context value is in the target list
NOT_IN"not_in"Context value is not in the target list
CONTAINS"contains"Target is a substring of the context value
MATCHES"matches"Context value matches the regex in target

Exceptions

PolicyViolationError

from agent_os.exceptions import PolicyViolationError
Raised when a governance policy check fails. Extends PolicyErrorAgentOSErrorException.
error_code
str
"POLICY_VIOLATION" by default.
details
dict
Structured details including category, matched_rule, detail, scope, operation, tool_name, and all fields from the audit_entry.
timestamp
str
ISO 8601 UTC timestamp when the error was raised.
check_result
PolicyCheckResult | None
The underlying PolicyCheckResult if the error was created from one, otherwise None.

PolicyDeniedError

from agent_os.exceptions import PolicyDeniedError
Raised when a policy explicitly denies an action. error_code is "POLICY_DENIED".

Code Examples

Basic: Load policies and evaluate

from agent_os.policies import PolicyEvaluator

evaluator = PolicyEvaluator()
evaluator.load_policies("./policies/")  # loads all .yaml/.yml files

context = {
    "tool_name": "execute_code",
    "token_count": 500,
}
decision = evaluator.evaluate(context)

if decision.allowed:
    print(f"✅ Permitted by rule: {decision.matched_rule or 'default'}")
else:
    print(f"❌ Denied: {decision.reason}")
    print(f"   Rule: {decision.matched_rule}")

Programmatic policy construction

from agent_os.policies import (
    PolicyEvaluator, PolicyDocument, PolicyRule,
    PolicyCondition, PolicyAction, PolicyOperator, PolicyDefaults,
)

policy = PolicyDocument(
    name="production-safety",
    description="Blocks dangerous tools in production",
    rules=[
        PolicyRule(
            name="block-dangerous-tools",
            condition=PolicyCondition(
                field="tool_name",
                operator=PolicyOperator.IN,
                value=["execute_code", "run_shell", "delete_file"],
            ),
            action=PolicyAction.DENY,
            priority=100,
            message="Dangerous tool blocked in production",
        ),
        PolicyRule(
            name="audit-all-tool-calls",
            condition=PolicyCondition(
                field="tool_name",
                operator=PolicyOperator.NE,
                value="",
            ),
            action=PolicyAction.AUDIT,
            priority=10,
            message="All tool calls are audit-logged",
        ),
    ],
    defaults=PolicyDefaults(
        action=PolicyAction.ALLOW,
        max_tokens=2048,
        max_tool_calls=5,
        confidence_threshold=0.95,
    ),
)

evaluator = PolicyEvaluator(policies=[policy])

# Allowed tool
decision = evaluator.evaluate({"tool_name": "web_search"})
assert decision.allowed  # True — matches audit rule, action=audit is allowed

# Blocked tool
decision = evaluator.evaluate({"tool_name": "execute_code"})
assert not decision.allowed  # False — matched block-dangerous-tools
print(decision.reason)       # "Dangerous tool blocked in production"

Handling a deny decision

from agent_os.policies import PolicyEvaluator
from agent_os.exceptions import PolicyDeniedError

evaluator = PolicyEvaluator()
evaluator.load_policies("./policies/")

def call_tool(tool_name: str, **kwargs):
    decision = evaluator.evaluate({"tool_name": tool_name, **kwargs})
    
    if not decision.allowed:
        raise PolicyDeniedError(
            f"Tool '{tool_name}' denied: {decision.reason}",
            details=decision.audit_entry,
        )
    
    # Proceed with tool execution
    return run_tool(tool_name, **kwargs)

Serialise to YAML for version control

policy = PolicyDocument(name="my-policy", rules=[...])
policy.to_yaml("policies/my-policy.yaml")

# Load back
loaded = PolicyDocument.from_yaml("policies/my-policy.yaml")

Policy YAML Reference

version: "1.0"
name: production
description: Production safety policy

rules:
  - name: block-code-execution
    condition:
      field: tool_name
      operator: eq
      value: execute_code
    action: block
    priority: 100
    message: Code execution is blocked in production

  - name: token-limit
    condition:
      field: token_count
      operator: gt
      value: 2048
    action: deny
    priority: 90
    message: Token count exceeds production limit

  - name: allow-safe-tools
    condition:
      field: tool_name
      operator: in
      value: [web_search, read_file, summarize]
    action: allow
    priority: 70
    message: Tool is on the approved list

defaults:
  action: deny
  max_tokens: 2048
  max_tool_calls: 5
  confidence_threshold: 0.95

OPA Backend Integration

Register an OPA/Rego backend for policies that require external evaluation:
from agent_os.policies import PolicyEvaluator

evaluator = PolicyEvaluator()

# Method 1: convenience helper
backend = evaluator.load_rego(
    rego_path="policies/agent_policy.rego",
    package="agentos",
    mode="local",
)

# Method 2: manual registration
from agent_os.policies import OPABackend

opa = OPABackend(
    rego_content="""
    package agentos

    default allow = false

    allow {
        input.tool_name == "web_search"
    }
    """,
    package="agentos",
    mode="local",
)
evaluator.add_backend(opa)

# YAML rules are evaluated first;
# OPA is only consulted when no YAML rule matches.
decision = evaluator.evaluate({"tool_name": "web_search"})

See Also

Build docs developers (and LLMs) love