PolicyEvaluator — Python Policy Evaluation Engine

PolicyEvaluator is the core governance engine of the Agent Governance Toolkit. It loads declarative YAML policies, evaluates every agent action against them, and returns a structured PolicyDecision before the action is ever executed. Every tool call, delegation request, or message send can be intercepted and checked deterministically — actions the engine denies become structurally impossible. The evaluator sits at the centre of the governance pipeline:

Agent action ──► PolicyEvaluator.evaluate(context)
                       │
                  YAML rules (priority-sorted)
                  External backends (OPA / Cedar)
                  Default action (fail-closed deny)
                       │
                 PolicyDecision { allowed, action, reason, audit_entry }

Installation

pip install agent-governance-toolkit-core

Import

from agent_os.policies import (
    PolicyEvaluator,
    PolicyDocument,
    PolicyDecision,
    PolicyRule,
    PolicyCondition,
    PolicyDefaults,
    PolicyAction,
    PolicyOperator,
)

Constructor

PolicyEvaluator(policies=None, root_dir=None)

policies

list[PolicyDocument] | None

default:"None"

An optional list of already-loaded PolicyDocument objects to seed the evaluator. When None, the policy list starts empty and policies can be added with load_policies().

root_dir

str | Path | None

default:"None"

Optional root directory for folder-scoped policy discovery. When set and the evaluation context contains a path key, the evaluator walks governance.yaml files from the action path up to this root and merges them hierarchically. Leave None for flat policy evaluation (the common case).

Methods

`load_policies()`

evaluator.load_policies(directory: str | Path) -> None

Loads every .yaml and .yml file in directory as a PolicyDocument and appends them to the evaluator’s internal policy list. Can be called multiple times to load from multiple directories — all rules across all documents are merged and evaluated together, sorted by priority (descending).

`evaluate()`

evaluator.evaluate(
    context: dict[str, Any],
    dynamic_context: dict[str, Any] | None = None,
) -> PolicyDecision

Evaluates all loaded policy rules against context. Rules are sorted by priority (highest first); the first matching rule determines the decision. If no YAML rule matches, registered external backends are consulted in registration order. If nothing matches, the default action from the first loaded policy is applied (or a global deny if no policies are loaded — fail-closed).

context

dict[str, Any]

required

A flat dictionary of action-level properties. Common keys include tool_name, token_count, confidence, and message. Keys must match the field values in your policy condition blocks.

context = {
    "tool_name": "execute_code",
    "token_count": 1500,
    "confidence": 0.92,
}

dynamic_context

dict[str, Any] | None

default:"None"

Optional runtime context for v1 dynamic conditions (time-window, day-of-week, token budget, cost budget). Existing callers that omit this argument are unaffected. Structure:

dynamic_context = {
    "budget": {
        "token_count": 3500,
        "cost": 0.05,
    }
}

Returns a PolicyDecision (see PolicyDecision fields below). Error behaviour: All exceptions inside the evaluator are caught and converted to a fail-closed PolicyDecision(allowed=False, action="deny"). The engine never raises.

`add_backend()`

evaluator.add_backend(backend: Any) -> None

Registers an external policy backend. Backends are consulted in registration order only after all loaded YAML rules have been checked without a match. Each backend must implement evaluate(context) -> BackendDecision and expose a name property.

backend

Any

required

An ExternalPolicyBackend implementation such as OPABackend or CedarBackend from agent_os.policies.backends. When a backend’s evaluate() returns an error, the evaluator denies access immediately (fail-closed) without consulting the next backend.

`load_rego()`

evaluator.load_rego(
    rego_path: str | None = None,
    rego_content: str | None = None,
    package: str = "agentos",
    mode: str = "local",
) -> OPABackend

Convenience method — creates and registers an OPABackend in a single call.

rego_path

str | None

default:"None"

Path to a .rego file.

rego_content

str | None

default:"None"

Inline Rego policy string (use instead of rego_path).

package

str

default:"agentos"

Rego package name for query construction.

mode

str

default:"local"

Evaluation mode: "local", "remote", or "builtin".

Returns the registered OPABackend instance.

`load_cedar()`

evaluator.load_cedar(
    policy_path: str | None = None,
    policy_content: str | None = None,
    entities: list[dict[str, Any]] | None = None,
    mode: str = "auto",
) -> CedarBackend

Convenience method — creates and registers a CedarBackend in a single call.

policy_path

str | None

default:"None"

Path to a .cedar policy file.

policy_content

str | None

default:"None"

Inline Cedar policy string.

entities

list[dict[str, Any]] | None

default:"None"

Cedar entities for authorization context.

mode

str

default:"auto"

Evaluation mode: "auto", "cedarpy", "cli", or "builtin".

Returns the registered CedarBackend instance.

PolicyDecision Fields

Every call to evaluate() returns a PolicyDecision (a Pydantic BaseModel).

allowed

bool

default:"True"

True if the action is permitted. False if it was denied or blocked.

matched_rule

str | None

default:"None"

Name of the policy rule that fired. None when the default action was applied or an external backend responded.

action

str

default:"allow"

The action taken: "allow", "deny", "audit", or "block".

reason

str

Human-readable explanation of the decision. Comes from the matched rule’s message field, the backend’s reason, or "No rules matched; default action applied".

audit_entry

dict

Structured audit data automatically attached to every decision.

Show audit_entry keys

policy

str | None

Name of the policy document that contained the matching rule.

rule

str | None

Name of the matched rule (same as matched_rule).

action

str

The action value ("allow", "deny", etc.).

context_snapshot

dict

A deep copy of the context dict passed to evaluate().

timestamp

str

ISO 8601 UTC timestamp of the evaluation.

error

bool

True only when a fail-closed error decision was returned.

backend

str

Present when an external backend answered. Value is the backend’s name property, prefixed with "external:".

evaluation_ms

float

Backend evaluation latency in milliseconds (external backends only).

metadata

dict

Structured adaptation hints populated by dynamic-context rules. Empty for standard static rules.

Show metadata keys

backoff_seconds

int

Suggested retry delay when a budget condition triggers a deny.

blocked_tools

list[str]

Tools specifically blocked by the matched rule.

retry_after

int

Unix timestamp after which a retry is appropriate.

PolicyDocument

PolicyDocument is a Pydantic BaseModel that represents the top-level structure of a YAML policy file.

version

str

default:"1.0"

Schema version string.

name

str

default:"unnamed"

Human-readable policy identifier.

description

str

default:""

Free-text description of what the policy enforces.

rules

list[PolicyRule]

default:"[]"

Ordered list of PolicyRule objects. Evaluated by priority (descending).

defaults

PolicyDefaults

Default action and budget limits applied when no rule matches.

inherit

bool

default:"True"

When using folder-scoped evaluation, setting False stops loading parent governance.yaml files.

scope

str | None

default:"None"

Glob pattern — the policy only applies when the action path in context matches this pattern.

network_allowlist

list[str]

default:"[]"

Host patterns the sandbox may reach (e.g. "pypi.org", "*.github.com"). Combined with defaults.network_default to form the sandbox egress policy. Consumed by sandbox providers; ignored by the rule engine.

tool_allowlist

list[str]

default:"[]"

Tool names the agent may invoke. Enforced host-side by PolicyEvaluator before any sandbox call.

Class methods

# Load from YAML file (requires pyyaml)
doc = PolicyDocument.from_yaml("policies/production.yaml")

# Load from JSON file
doc = PolicyDocument.from_json("policies/production.json")

# Serialize back
doc.to_yaml("policies/production.yaml")
doc.to_json("policies/production.json")

PolicyRule Fields

name

str

required

Unique rule identifier within the policy document.

condition

PolicyCondition

required

The condition evaluated against the context dictionary.

action

PolicyAction

required

The action to take when the condition matches.

priority

int

default:"0"

Rules with higher values are evaluated first. Two rules with the same priority are evaluated in document order.

message

str

default:""

Human-readable explanation returned in PolicyDecision.reason.

dynamic_condition

DynamicCondition | None

default:"None"

Optional runtime condition (time window, day-of-week, token budget, cost budget). Evaluated alongside the static condition.

override

bool

default:"False"

If True, replaces a parent rule with the same name during folder-level policy merging.

PolicyCondition Fields

field

str

required

The key to look up in the evaluation context dictionary. Examples: "tool_name", "token_count", "confidence".

operator

PolicyOperator

required

The comparison operator. See PolicyOperator below.

value

Any

required

The right-hand side of the comparison. Type must be compatible with the operator (e.g., a list for in, a number for gt).

PolicyDefaults Fields

action

PolicyAction

default:"deny"

Fallback action when no rule matches. Defaults to deny (fail-closed). Set to allow explicitly to opt into a permissive posture.

max_tokens

int

default:"4096"

Maximum token count per request evaluated by the rule engine.

max_tool_calls

int

default:"10"

Maximum tool invocations per request evaluated by the rule engine.

confidence_threshold

float

default:"0.8"

Minimum confidence score [0.0–1.0] evaluated by the rule engine.

max_cpu

float | None

default:"None"

Sandbox CPU limit in vCPUs (e.g. 0.5, 1.0). None = provider default. Consumed by sandbox providers; ignored by the rule engine.

max_memory_mb

int | None

default:"None"

Sandbox memory limit in MiB. None = provider default. Consumed by sandbox providers.

timeout_seconds

int | None

default:"None"

Per-execute wall-clock cap in seconds. None = provider default.

network_default

str

default:"deny"

Default sandbox egress action when a host is not on network_allowlist. "deny" is fail-closed and is the default. Set to "allow" only for trusted dev/research workloads.

PolicyAction Enum

Value	`allowed`	Description
`ALLOW` / `"allow"`	`True`	Permit the request.
`DENY` / `"deny"`	`False`	Reject the request.
`AUDIT` / `"audit"`	`True`	Permit but write an audit entry.
`BLOCK` / `"block"`	`False`	Hard block; the reason is surfaced to the caller.

PolicyOperator Values

Enum member	YAML string	Behaviour
`EQ`	`"eq"`	Exact equality
`NE`	`"ne"`	Not equal
`GT`	`"gt"`	Greater than
`LT`	`"lt"`	Less than
`GTE`	`"gte"`	Greater than or equal
`LTE`	`"lte"`	Less than or equal
`IN`	`"in"`	Context value is in the target list
`NOT_IN`	`"not_in"`	Context value is not in the target list
`CONTAINS`	`"contains"`	Target is a substring of the context value
`MATCHES`	`"matches"`	Context value matches the regex in target

Exceptions

`PolicyViolationError`

from agent_os.exceptions import PolicyViolationError

Raised when a governance policy check fails. Extends PolicyError → AgentOSError → Exception.

error_code

str

"POLICY_VIOLATION" by default.

details

dict

Structured details including category, matched_rule, detail, scope, operation, tool_name, and all fields from the audit_entry.

timestamp

str

ISO 8601 UTC timestamp when the error was raised.

check_result

PolicyCheckResult | None

The underlying PolicyCheckResult if the error was created from one, otherwise None.

`PolicyDeniedError`

from agent_os.exceptions import PolicyDeniedError

Raised when a policy explicitly denies an action. error_code is "POLICY_DENIED".

Code Examples

Basic: Load policies and evaluate

from agent_os.policies import PolicyEvaluator

evaluator = PolicyEvaluator()
evaluator.load_policies("./policies/")  # loads all .yaml/.yml files

context = {
    "tool_name": "execute_code",
    "token_count": 500,
}
decision = evaluator.evaluate(context)

if decision.allowed:
    print(f"✅ Permitted by rule: {decision.matched_rule or 'default'}")
else:
    print(f"❌ Denied: {decision.reason}")
    print(f"   Rule: {decision.matched_rule}")

Programmatic policy construction

from agent_os.policies import (
    PolicyEvaluator, PolicyDocument, PolicyRule,
    PolicyCondition, PolicyAction, PolicyOperator, PolicyDefaults,
)

policy = PolicyDocument(
    name="production-safety",
    description="Blocks dangerous tools in production",
    rules=[
        PolicyRule(
            name="block-dangerous-tools",
            condition=PolicyCondition(
                field="tool_name",
                operator=PolicyOperator.IN,
                value=["execute_code", "run_shell", "delete_file"],
            ),
            action=PolicyAction.DENY,
            priority=100,
            message="Dangerous tool blocked in production",
        ),
        PolicyRule(
            name="audit-all-tool-calls",
            condition=PolicyCondition(
                field="tool_name",
                operator=PolicyOperator.NE,
                value="",
            ),
            action=PolicyAction.AUDIT,
            priority=10,
            message="All tool calls are audit-logged",
        ),
    ],
    defaults=PolicyDefaults(
        action=PolicyAction.ALLOW,
        max_tokens=2048,
        max_tool_calls=5,
        confidence_threshold=0.95,
    ),
)

evaluator = PolicyEvaluator(policies=[policy])

# Allowed tool
decision = evaluator.evaluate({"tool_name": "web_search"})
assert decision.allowed  # True — matches audit rule, action=audit is allowed

# Blocked tool
decision = evaluator.evaluate({"tool_name": "execute_code"})
assert not decision.allowed  # False — matched block-dangerous-tools
print(decision.reason)       # "Dangerous tool blocked in production"

Handling a deny decision

from agent_os.policies import PolicyEvaluator
from agent_os.exceptions import PolicyDeniedError

evaluator = PolicyEvaluator()
evaluator.load_policies("./policies/")

def call_tool(tool_name: str, **kwargs):
    decision = evaluator.evaluate({"tool_name": tool_name, **kwargs})
    
    if not decision.allowed:
        raise PolicyDeniedError(
            f"Tool '{tool_name}' denied: {decision.reason}",
            details=decision.audit_entry,
        )
    
    # Proceed with tool execution
    return run_tool(tool_name, **kwargs)

Serialise to YAML for version control

policy = PolicyDocument(name="my-policy", rules=[...])
policy.to_yaml("policies/my-policy.yaml")

# Load back
loaded = PolicyDocument.from_yaml("policies/my-policy.yaml")

Policy YAML Reference

version: "1.0"
name: production
description: Production safety policy

rules:
  - name: block-code-execution
    condition:
      field: tool_name
      operator: eq
      value: execute_code
    action: block
    priority: 100
    message: Code execution is blocked in production

  - name: token-limit
    condition:
      field: token_count
      operator: gt
      value: 2048
    action: deny
    priority: 90
    message: Token count exceeds production limit

  - name: allow-safe-tools
    condition:
      field: tool_name
      operator: in
      value: [web_search, read_file, summarize]
    action: allow
    priority: 70
    message: Tool is on the approved list

defaults:
  action: deny
  max_tokens: 2048
  max_tool_calls: 5
  confidence_threshold: 0.95

OPA Backend Integration

from agent_os.policies import PolicyEvaluator

evaluator = PolicyEvaluator()

# Method 1: convenience helper
backend = evaluator.load_rego(
    rego_path="policies/agent_policy.rego",
    package="agentos",
    mode="local",
)

# Method 2: manual registration
from agent_os.policies import OPABackend

opa = OPABackend(
    rego_content="""
    package agentos

    default allow = false

    allow {
        input.tool_name == "web_search"
    }
    """,
    package="agentos",
    mode="local",
)
evaluator.add_backend(opa)

# YAML rules are evaluated first;
# OPA is only consulted when no YAML rule matches.
decision = evaluator.evaluate({"tool_name": "web_search"})

Python SDK

Other SDKs

Specifications

PolicyEvaluator — Python Policy Evaluation Engine

Installation

Import

Constructor

Methods

`load_policies()`

`evaluate()`

`add_backend()`

`load_rego()`

`load_cedar()`

PolicyDecision Fields

PolicyDocument

Class methods

PolicyRule Fields

PolicyCondition Fields

PolicyDefaults Fields

PolicyAction Enum

PolicyOperator Values

Exceptions

`PolicyViolationError`

`PolicyDeniedError`

Code Examples

Basic: Load policies and evaluate

Programmatic policy construction

Handling a deny decision

Serialise to YAML for version control

Policy YAML Reference

OPA Backend Integration

See Also

Build docs developers (and LLMs) love

Python SDK

Other SDKs

Specifications

Documentation Index

​Installation

​Import

​Constructor

​Methods

​load_policies()

​evaluate()

​add_backend()

​load_rego()

​load_cedar()

​PolicyDecision Fields

​PolicyDocument

​Class methods

​PolicyRule Fields

​PolicyCondition Fields

​PolicyDefaults Fields

​PolicyAction Enum

​PolicyOperator Values

​Exceptions

​PolicyViolationError

​PolicyDeniedError

​Code Examples

​Basic: Load policies and evaluate

​Programmatic policy construction

​Handling a deny decision

​Serialise to YAML for version control

​Policy YAML Reference

​OPA Backend Integration

​See Also

Build docs developers (and LLMs) love

Installation

Import

Constructor

Methods

`load_policies()`

`evaluate()`

`add_backend()`

`load_rego()`

`load_cedar()`

PolicyDecision Fields

PolicyDocument

Class methods

PolicyRule Fields

PolicyCondition Fields

PolicyDefaults Fields

PolicyAction Enum

PolicyOperator Values

Exceptions

`PolicyViolationError`

`PolicyDeniedError`

Code Examples

Basic: Load policies and evaluate

Programmatic policy construction

Handling a deny decision

Serialise to YAML for version control

Policy YAML Reference

OPA Backend Integration

See Also