Execution Sandboxing with Four Privilege Rings

AI agents that can read files, call APIs, and execute code need strict runtime boundaries. Without sandboxing, a misbehaving agent can exfiltrate data, corrupt state, exhaust resources, or cascade failures across a multi-agent workflow. The Agent Runtime (pip install agentmesh-runtime) addresses this with a hardware-inspired four-ring privilege model: just as an OS kernel uses CPU privilege rings to separate user processes from kernel operations, the Agent Hypervisor assigns agents to execution rings based on their trust scores and enforces resource boundaries at each level.

┌─────────────────────────────────────────────────┐
│             Execution Ring Model                │
│  Ring 0 (Root) → Ring 3 (Sandbox)               │
├─────────────────────────────────────────────────┤
│           Capability Guards                     │
│  Per-agent tool allow/deny lists                │
├─────────────────────────────────────────────────┤
│          Saga Orchestration                     │
│  Multi-step transactions with auto-rollback     │
├─────────────────────────────────────────────────┤
│          Session Isolation                      │
│  VFS namespacing, vector clocks, intent locks   │
├─────────────────────────────────────────────────┤
│          Emergency Controls                     │
│  Kill switch, rate limiting, breach detection   │
└─────────────────────────────────────────────────┘

The Four-Ring Model

The hypervisor implements exactly four execution rings. Ring ordering is numerical: lower value = higher privilege. Ring 0 is the most privileged; Ring 3 is the default for unknown or untrusted agents.

Ring	Name	Effective Score	Description
Ring 0	Root	N/A (SRE Witness required)	Hypervisor configuration and penalty operations. System-only; never assigned by score alone.
Ring 1	Privileged	> 0.95 + consensus	Non-reversible actions with full resource access (write, deploy, delete).
Ring 2	Standard	> 0.60	Reversible actions with scoped resource access (write files, call APIs).
Ring 3	Sandbox	Default	Read-only and research actions with minimal resource access. No network, no writes.

Ring 0 is never accessible to agents through the standard API. Ring 0 operations require explicit SRE Witness attestation through an out-of-band mechanism. This prevents any compromised agent from escalating to hypervisor-level privileges.

Ring Assignment from Effective Score

Ring assignment from an agent’s effective score (eff_score) follows strict threshold rules:

from hypervisor.models import ExecutionRing

# eff_score > 0.95 AND has_consensus=True → Ring 1 (Privileged)
ring = ExecutionRing.from_eff_score(eff_score=0.98, has_consensus=True)
assert ring == ExecutionRing.RING_1_PRIVILEGED

# eff_score > 0.60 → Ring 2 (Standard)
ring = ExecutionRing.from_eff_score(eff_score=0.75)
assert ring == ExecutionRing.RING_2_STANDARD

# Otherwise → Ring 3 (Sandbox, default)
ring = ExecutionRing.from_eff_score(eff_score=0.40)
assert ring == ExecutionRing.RING_3_SANDBOX

Action Classification

Every action is automatically classified to a required ring based on its reversibility, is_read_only, and is_admin flags:

Condition	Required Ring
`is_admin == true`	Ring 0 (Root)
`reversibility == NONE` and `is_read_only == false`	Ring 1 (Privileged)
`is_read_only == true`	Ring 3 (Sandbox)
Otherwise (reversible, not read-only)	Ring 2 (Standard)

from hypervisor.rings.classifier import ActionClassifier
from hypervisor.models import ReversibilityLevel

classifier = ActionClassifier()

result = classifier.classify_action_id("file.read")
assert result.ring == ExecutionRing.RING_3_SANDBOX
assert result.reversibility == ReversibilityLevel.REVERSIBLE

result = classifier.classify_action_id("deploy.k8s")
assert result.ring == ExecutionRing.RING_1_PRIVILEGED
assert result.reversibility == ReversibilityLevel.NON_REVERSIBLE

# Override classification for custom actions
classifier.set_override("my_custom.action", ring=ExecutionRing.RING_2_STANDARD, risk_weight=0.5)

Resource Limits Per Ring

Each ring has associated ResourceConstraints that govern what the agent can access:

Ring	Network	Filesystem Scope	Subprocess	Max Concurrent Tools
Ring 0 (Root)	Yes	`full`	Yes	32
Ring 1 (Privileged)	Yes	`full`	Yes	16
Ring 2 (Standard)	Yes (allowlist)	`scoped`	Yes	8
Ring 3 (Sandbox)	No	`none`	No	2

Filesystem scope semantics:

Scope	Meaning
`none`	No filesystem access
`session`	Agent’s own session directory only
`scoped`	Agent’s directories plus explicitly granted paths
`full`	Unrestricted filesystem access

Rate limits per ring (token bucket algorithm):

Ring	Rate (req/s)	Burst
Ring 0 (Root)	100.0	200.0
Ring 1 (Privileged)	50.0	100.0
Ring 2 (Standard)	20.0	40.0
Ring 3 (Sandbox)	5.0	10.0

Privilege Elevation

Agents can request temporary access to a higher ring. Elevations are time-bounded, trust-gated, and automatically expire.

from hypervisor.rings.elevation import RingElevationManager

manager = RingElevationManager()

# Request elevation from Ring 2 → Ring 1
elevation = manager.request_elevation(
    agent_did="did:example:agent-42",
    session_id="session-001",
    current_ring=ExecutionRing.RING_2_STANDARD,
    target_ring=ExecutionRing.RING_1_PRIVILEGED,
    ttl_seconds=300,          # 5-minute window (max: 3600s)
    reason="Deploying approved release v2.1.0",
    attestation="signed-approval-token-from-sre",
    trust_score=0.92,
)

if elevation.is_active:
    effective = manager.get_effective_ring(
        agent_did="did:example:agent-42",
        session_id="session-001",
        base_ring=ExecutionRing.RING_2_STANDARD,
    )
    assert effective == ExecutionRing.RING_1_PRIVILEGED

# Revoke early if needed
manager.revoke_elevation(elevation.elevation_id)

Trust thresholds for elevation:

Target Ring	Required Trust Score
Ring 1 (Privileged)	≥ 0.85
Ring 2 (Standard)	≥ 0.50

Denial reasons:

Reason	Condition
`ring_0_forbidden`	Target is Ring 0 — never allowed via standard API
`insufficient_trust`	Trust score below threshold
`no_sponsorship`	Ring 1 elevation without SRE attestation
`duplicate_elevation`	Agent already has an active elevation in this session
`invalid_target`	Target ring is same or lower privilege than current

Saga Orchestration

Multi-step agent workflows risk leaving the system in a broken half-finished state when a step fails. The Saga Orchestrator wraps multi-step workflows in distributed transactions with automatic compensation (rollback) in reverse order.

How Sagas Work

Step 1: Create PR          ──→  Compensate: Close PR
Step 2: Run tests          ──→  Compensate: Cancel test run
Step 3: Deploy to staging  ──→  Compensate: Rollback deployment
Step 4: Notify team        ──→  Compensate: Send failure notice

If Step 3 fails:
  → Compensate Step 2 (cancel tests)
  → Compensate Step 1 (close PR)
  → Saga state: COMPENSATING → FAILED

Creating a Saga

from hypervisor.saga.orchestrator import SagaOrchestrator

orchestrator = SagaOrchestrator()

# Create a new saga
saga = orchestrator.create_saga(session_id="session-deploy-42")

# Add steps with execute and undo APIs
orchestrator.add_step(
    saga_id=saga.saga_id,
    action_id="pr.create",
    agent_did="did:example:dev-agent",
    execute_api="/api/pr/create",
    undo_api="/api/pr/close",     # compensation action
    timeout_seconds=60,
    max_retries=2,
)

orchestrator.add_step(
    saga_id=saga.saga_id,
    action_id="deploy.staging",
    agent_did="did:example:deploy-agent",
    execute_api="/api/deploy/staging",
    undo_api="/api/deploy/rollback",
    timeout_seconds=600,
)

Step and Saga State Machines

Each step transitions through a well-defined state machine:

PENDING → EXECUTING → COMMITTED
                   ↘ FAILED → COMPENSATING → COMPENSATED
                                          ↘ COMPENSATION_FAILED

Saga-level states:

State	Meaning
`RUNNING`	Steps are being executed sequentially
`COMPENSATING`	A step failed; compensation is running in reverse
`COMPLETED`	All steps committed successfully
`FAILED`	All compensation finished (or some compensation failed)
`ESCALATED`	Compensation itself failed; human intervention required

Declarative Sagas with the DSL

from hypervisor.saga.dsl import SagaDSLParser

saga_yaml = """
saga:
  id: deploy-pipeline
  steps:
    - id: create-pr
      action_id: pr.create
      agent: did:example:dev-agent
      execute_api: /api/pr/create
      undo_api: /api/pr/close
      timeout: 60
      retries: 2

    - id: run-tests
      action_id: tests.run
      agent: did:example:ci-agent
      execute_api: /api/tests/run
      undo_api: /api/tests/cancel
      timeout: 300
      depends_on: [create-pr]

    - id: deploy-staging
      action_id: deploy.staging
      agent: did:example:deploy-agent
      execute_api: /api/deploy/staging
      undo_api: /api/deploy/rollback
      timeout: 600
      depends_on: [run-tests]
      checkpoint_goal: "Staging deployment matches PR diff"
"""

parser = SagaDSLParser()
definition = parser.parse(saga_yaml)

Saga defaults: max_retries=2, retry_delay_seconds=1.0 (linear backoff), step_timeout_seconds=300.

Kill Switch

The KillSwitch terminates an agent immediately and triggers saga compensation for any in-flight work. It is the last resort for stopping a misbehaving agent.

from hypervisor.security.kill_switch import KillSwitch, KillReason, KillResult

kill_switch = KillSwitch()

result: KillResult = kill_switch.kill(
    agent_did="did:example:rogue-agent",
    session_id="session-001",
    reason=KillReason.BEHAVIORAL_DRIFT,
    details="Agent started accessing files outside its namespace",
)

print(f"Kill ID:               {result.kill_id}")
print(f"Compensation triggered: {result.compensation_triggered}")
print(f"Handoffs succeeded:    {result.handoff_success_count}")
print(f"Terminated:            {result.terminated}")

Kill reasons:

Reason	When to Use
`BEHAVIORAL_DRIFT`	Agent deviates from expected behavior patterns
`RATE_LIMIT`	Agent exceeded its rate limit repeatedly
`RING_BREACH`	Agent attempted actions above its ring level
`MANUAL`	Human operator triggered the kill
`QUARANTINE_TIMEOUT`	Agent was quarantined and did not recover
`SESSION_TIMEOUT`	Session exceeded its `max_duration_seconds`

Graceful Shutdown with Step Handoff

# Register a substitute to take over in-flight work
kill_switch.register_substitute(
    session_id="session-001",
    agent_did="did:example:backup-agent",
)

result = kill_switch.kill(
    agent_did="did:example:primary-agent",
    session_id="session-001",
    reason=KillReason.MANUAL,
    details="Planned maintenance rotation",
)

# Check handoff results
for handoff in result.handoffs:
    print(f"Step {handoff.step_id}: {handoff.status}")
    # HandoffStatus: PENDING, HANDED_OFF, FAILED, COMPENSATED

If no substitute is registered or the callback times out (5-second timeout), terminated is set to false and the kill result is still recorded.

Resource Limits

Termination Control Signals

An agent’s session can be bounded by several configurable limits:

from hypervisor.models import SessionConfig, ConsistencyMode

config = SessionConfig(
    consistency_mode=ConsistencyMode.STRONG,
    max_participants=5,
    max_duration_seconds=3600,   # session hard stop after 1 hour
    min_eff_score=0.60,          # minimum trust to join the session
    enable_audit=True,
    enable_blockchain_commitment=False,
)

Rate Limiting

Per-agent rate limits use a token bucket algorithm. When the bucket is exhausted, RateLimitExceeded is raised:

from hypervisor.models import ExecutionRing
from hypervisor.security.rate_limiter import AgentRateLimiter, RateLimitExceeded

# One limiter covers all rings; per-ring limits are configured automatically.
limiter = AgentRateLimiter()

# try_check() returns False instead of raising when limit is exceeded.
allowed = limiter.try_check(
    agent_did="did:example:new-agent",
    session_id="session-001",
    ring=ExecutionRing.RING_3_SANDBOX,
)
if not allowed:
    print("Rate limited — request denied")

# check() raises RateLimitExceeded on exhaustion.
try:
    limiter.check(
        agent_did="did:example:new-agent",
        session_id="session-001",
        ring=ExecutionRing.RING_3_SANDBOX,
    )
except RateLimitExceeded as exc:
    print(f"Rate limit exceeded: {exc}")

# Update the bucket when the agent's ring changes (e.g., after promotion).
limiter.update_ring(
    agent_did="did:example:new-agent",
    session_id="session-001",
    new_ring=ExecutionRing.RING_2_STANDARD,
)

Session Isolation

Each session uses a SessionVFS that provides per-agent isolated file views. Agents cannot read or write each other’s files without explicit cross-session access grants.

from hypervisor.session.sso import SessionVFS, VFSPermissionError

vfs = SessionVFS()

# Agent A writes — only Agent A can see it
vfs.write(path="/workspace/plan.md", agent_did="did:agent-a", value="# My Plan")

# Agent B cannot read Agent A's file
try:
    vfs.read(path="/workspace/plan.md", agent_did="did:agent-b")
except VFSPermissionError:
    print("Access denied: Agent B cannot read Agent A's namespace")

# Agent B writes to the same path — gets its own isolated copy
vfs.write(path="/workspace/plan.md", agent_did="did:agent-b", value="# Different Plan")

assert vfs.read("/workspace/plan.md", "did:agent-a") == "# My Plan"
assert vfs.read("/workspace/plan.md", "did:agent-b") == "# Different Plan"

Isolation levels:

Level	Description	Cost
`SNAPSHOT`	Complete isolation — agent sees only its own session state	Low
`READ_COMMITTED`	Can read from granted sessions; writes to own only	Medium
`SERIALIZABLE`	Full causal ordering with vector clocks and intent locks	High

Breach Detection Pipeline

Wire breach detection and rate limiting together for automated enforcement:

from hypervisor.rings.breach_detector import RingBreachDetector, BreachSeverity
from hypervisor.security.kill_switch import KillSwitch, KillReason
from hypervisor.security.rate_limiter import AgentRateLimiter

detector = RingBreachDetector()
kill_switch = KillSwitch()
limiter = AgentRateLimiter(window_seconds=60.0, max_calls=100)

async def on_agent_action(agent_did: str, session_id: str, action_id: str):
    """Enforcement pipeline for every agent action."""

    # Layer 1: Rate limit check
    status = limiter.check_rate_limit(agent_did)
    if not status.allowed:
        kill_switch.kill(agent_did, session_id, KillReason.RATE_LIMIT)
        return

    # Layer 2: Ring enforcement
    # Breach severity: 1-ring gap → WARNING, 2-ring gap → HIGH, 3-ring gap → CRITICAL
    # CRITICAL severity triggers immediate kill

    # Layer 3: Capability guard check (handled by governance middleware)

    # Layer 4: Saga step execution (handled by orchestrator)

Ring demotion is automatic: when an agent’s effective score drops below the threshold for their current ring, should_demote() returns True and the agent is moved down to the ring matching their current score.

Get Started

Core Concepts

Guides

Compliance

Reference

Execution Sandboxing with Four Privilege Rings

The Four-Ring Model

Ring Assignment from Effective Score

Action Classification

Resource Limits Per Ring

Privilege Elevation

Saga Orchestration

How Sagas Work

Creating a Saga

Step and Saga State Machines

Declarative Sagas with the DSL

Kill Switch

Graceful Shutdown with Step Handoff

Resource Limits

Termination Control Signals

Rate Limiting

Session Isolation

Breach Detection Pipeline

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Compliance

Reference

Documentation Index

​The Four-Ring Model

​Ring Assignment from Effective Score

​Action Classification

​Resource Limits Per Ring

​Privilege Elevation

​Saga Orchestration

​How Sagas Work

​Creating a Saga

​Step and Saga State Machines

​Declarative Sagas with the DSL

​Kill Switch

​Graceful Shutdown with Step Handoff

​Resource Limits

​Termination Control Signals

​Rate Limiting

​Session Isolation

​Breach Detection Pipeline

Build docs developers (and LLMs) love

The Four-Ring Model

Ring Assignment from Effective Score

Action Classification

Resource Limits Per Ring

Privilege Elevation

Saga Orchestration

How Sagas Work

Creating a Saga

Step and Saga State Machines

Declarative Sagas with the DSL

Kill Switch

Graceful Shutdown with Step Handoff

Resource Limits

Termination Control Signals

Rate Limiting

Session Isolation

Breach Detection Pipeline