AI Tool Guardrails: Context-Aware Policy Enforcement

AI tool guardrails address the Lethal Trifecta by enforcing deterministic rules around tool use and tool outputs. Agents can still read sensitive internal data and process untrusted content, but Archestra can dynamically block risky follow-up actions when the context is no longer safe. This gives you a middle ground between a fully permissive agent that can read and send anything, and a permanently read-only agent that can never take external action — the same agent can operate normally in safe contexts and become more restricted only when context or tool output requires it.

How Context Flows Through Guardrails

Every tool call passes through a context evaluation before it runs, and every tool result is classified before it re-enters the agent loop. The flowchart below shows the full decision path.

Context sensitivity is cumulative. Once a tool result marks the context as sensitive, that state persists for all subsequent tool call evaluations within the same conversation — including calls made by subagents that inherit the parent’s context.

Tool Discovery

Archestra discovers tools in two ways, and both feed the same guardrails configuration view.

LLM Proxy Discovery

When requests flow through the LLM Proxy, Archestra records the tool definitions included in those requests. Any tool your agents declare in their system prompts is automatically surfaced here.

MCP Tool Discovery

When tools belong to MCP servers managed by the Archestra MCP Orchestrator, Archestra already knows those tool definitions and surfaces them in the same guardrails view — no additional configuration needed.

This gives you one control plane for tools discovered from live agent traffic and tools hosted by MCP infrastructure that Archestra orchestrates directly.

Tool Call Policies

Tool call policies control whether a tool may run in the current context. Archestra evaluates the policy against the actual arguments submitted to the tool call, not just the tool name.

Allow Always
Allow in Safe Context
Require Approval
Block Always

The tool can run even when the current context is marked sensitive or untrusted. Use this for safe internal read paths where the tool itself cannot cause harm regardless of what the context contains.

Example: Scoping `send_email` by Recipient Domain

A send_email tool may be acceptable for internal recipients but must require approval for external ones. The policy inspects the actual to argument before the call runs:

{
  "tool": "send_email",
  "conditions": [
    {
      "field": "to[*]",
      "operator": "all_match",
      "pattern": "@mycompany\\.com$",
      "action": "allow_always"
    },
    {
      "field": "to[*]",
      "operator": "any_not_match",
      "pattern": "@mycompany\\.com$",
      "action": "require_approval"
    }
  ]
}

This makes the policy decision depend on the actual attempted action, not just on the name of the tool.

Tool Result Policies

Tool result policies control how tool output is treated after a tool runs. Use them when the tool itself may be safe to call, but the returned data could be sensitive, adversarial, or prompt-injectable.

Safe

The result is considered safe and continues through the agent loop normally. Context state is unchanged.

Sensitive

The result is treated as sensitive or risky context for later decisions. Subsequent tool call policies that require a safe context will be denied.

Dual LLM

The result is routed through the Dual LLM Agent before it is returned to the main agent. The main agent only sees a constrained, safe summary — never the raw output.

Blocked

The result is blocked entirely and never enters the agent’s context. The tool call appears to have returned nothing.

Example: Classifying Emails by Sender Domain

A read_email tool may be safe to call, but returned messages from external senders could contain prompt injections. The policy inspects the result and classifies it:

{
  "tool": "read_email",
  "result_conditions": [
    {
      "field": "emails[*].from",
      "operator": "all_match",
      "pattern": "@mycompany\\.com$",
      "action": "safe"
    },
    {
      "field": "emails[*].from",
      "operator": "any_not_match",
      "pattern": "@mycompany\\.com$",
      "action": "sensitive"
    }
  ]
}

This lets the agent continue normally when it is only reading internal mail, while automatically tightening later tool use after reading email from outside your company.

Policy Evaluation Order

Archestra evaluates policies in the order they are defined. The first matching condition determines the action taken. Policies can also be scoped to specific agents — for example, allowing an internal support agent to use send_email for @mycompany.com recipients while keeping the same tool blocked for a broader research agent.

Subagent delegation does not reset trust state. If a parent agent delegates to a subagent after the conversation has already become sensitive, the subagent inherits that unsafe context and the same tool call restrictions continue to apply.

Deterministic vs. LLM Guardrails

Many platforms use probabilistic LLM guardrails that ask a model to decide whether content or actions are allowed. Archestra’s AI tool guardrails are different:

Property	Archestra Guardrails	LLM Guardrails
Decision source	Stored policies	Model judgment at runtime
Determinism	✅ Fully deterministic	❌ Probabilistic
Context-aware	✅ Evaluates live context	❌ Usually stateless
Auditable	✅ Inspect exact policies applied	❌ Opaque model output
Composable	✅ Combine with Dual LLM	Limited

Use probabilistic LLM guardrails when you want fuzzy classification or moderation. Use deterministic AI tool guardrails when you need predictable enforcement against data exfiltration and unsafe tool chaining.

Load Tools When Needed

When an agent or MCP Gateway uses Load tools when needed, the initial MCP tools/list only includes search_tools and run_tool. Tool call policies are still evaluated against the tool that actually runs. If run_tool is asked to execute send_email, Archestra evaluates the send_email policies with the submitted tool_args, current trust state, and policy context. Input conditions, team conditions, untrusted-context rules, and approval-required rules all work the same way as a direct send_email tool call — the indirection through run_tool does not bypass guardrails.

Policy Configuration Agent

Archestra includes a built-in Tool Policy Configuration Agent that analyzes tool metadata and proposes default tool call policies and tool result policies automatically, so you do not start from a blank screen for every new tool.

How the Policy Configuration Agent Works

When triggered, the subagent sends each tool’s name, description, MCP server name, parameter schema, and tool annotations to an LLM. The LLM returns structured recommendations for both tool call policies and tool result policies, along with reasoning that is stored for auditability.The agent can run in two ways:

Automatically on tool discovery. When enabled, newly discovered tools get default policies without manual review first.
Manually on demand. Trigger it for specific tools when you want Archestra to propose defaults for an existing tool set.

In both cases, tools that already have custom policies with conditions are preserved — only default policies are overwritten.

The Policy Configuration Agent can recommend the Dual LLM action automatically for tools that read from untrusted sources such as web search, email readers, and document processors. Review its suggestions in the guardrails UI before enabling automatic application in production.

Configuring Policies in the UI

Open the Guardrails View

Navigate to LLM Proxy → Tool Guardrails in the Archestra dashboard. All discovered tools appear here, grouped by discovery source (LLM Proxy traffic or MCP server).

Select a Tool

Click any tool to open its policy editor. You will see separate sections for Tool Call Policies and Tool Result Policies.

Add a Condition

For each policy, add one or more conditions that inspect arguments (for call policies) or returned fields (for result policies). Select the action to take when the condition matches.

Set the Default Action

Define a fallback action that applies when no conditions match. This is typically Allow in safe context for call policies and Safe or Sensitive for result policies.

Save and Test

Save the policy. The next agent request that uses this tool will be evaluated against the new rules. Use the audit log to confirm the expected action was taken.

Get Started

MCP

Agents

LLM Proxy

Security

Administration

Integrations

Contributing

AI Tool Guardrails: Context-Aware Policy Enforcement

How Context Flows Through Guardrails

Tool Discovery

LLM Proxy Discovery

MCP Tool Discovery

Tool Call Policies

Example: Scoping `send_email` by Recipient Domain

Tool Result Policies

Safe

Sensitive

Dual LLM

Blocked

Example: Classifying Emails by Sender Domain

Policy Evaluation Order

Deterministic vs. LLM Guardrails

Load Tools When Needed

Policy Configuration Agent

Configuring Policies in the UI

Build docs developers (and LLMs) love

Get Started

MCP

Agents

LLM Proxy

Security

Administration

Integrations

Contributing

Documentation Index

​How Context Flows Through Guardrails

​Tool Discovery

LLM Proxy Discovery

MCP Tool Discovery

​Tool Call Policies

​Example: Scoping send_email by Recipient Domain

​Tool Result Policies

Safe

Sensitive

Dual LLM

Blocked

​Example: Classifying Emails by Sender Domain

​Policy Evaluation Order

​Deterministic vs. LLM Guardrails

​Load Tools When Needed

​Policy Configuration Agent

​Configuring Policies in the UI

Build docs developers (and LLMs) love

How Context Flows Through Guardrails

Tool Discovery

Tool Call Policies

Example: Scoping `send_email` by Recipient Domain

Tool Result Policies

Example: Classifying Emails by Sender Domain

Policy Evaluation Order

Deterministic vs. LLM Guardrails

Load Tools When Needed

Policy Configuration Agent

Configuring Policies in the UI