AI tool guardrails address the Lethal Trifecta by enforcing deterministic rules around tool use and tool outputs. Agents can still read sensitive internal data and process untrusted content, but Archestra can dynamically block risky follow-up actions when the context is no longer safe. This gives you a middle ground between a fully permissive agent that can read and send anything, and a permanently read-only agent that can never take external action — the same agent can operate normally in safe contexts and become more restricted only when context or tool output requires it.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/archestra-ai/archestra/llms.txt
Use this file to discover all available pages before exploring further.
How Context Flows Through Guardrails
Every tool call passes through a context evaluation before it runs, and every tool result is classified before it re-enters the agent loop. The flowchart below shows the full decision path.Context sensitivity is cumulative. Once a tool result marks the context as sensitive, that state persists for all subsequent tool call evaluations within the same conversation — including calls made by subagents that inherit the parent’s context.
Tool Discovery
Archestra discovers tools in two ways, and both feed the same guardrails configuration view.LLM Proxy Discovery
When requests flow through the LLM Proxy, Archestra records the tool definitions included in those requests. Any tool your agents declare in their system prompts is automatically surfaced here.
MCP Tool Discovery
When tools belong to MCP servers managed by the Archestra MCP Orchestrator, Archestra already knows those tool definitions and surfaces them in the same guardrails view — no additional configuration needed.
Tool Call Policies
Tool call policies control whether a tool may run in the current context. Archestra evaluates the policy against the actual arguments submitted to the tool call, not just the tool name.- Allow Always
- Allow in Safe Context
- Require Approval
- Block Always
The tool can run even when the current context is marked sensitive or untrusted. Use this for safe internal read paths where the tool itself cannot cause harm regardless of what the context contains.
Example: Scoping send_email by Recipient Domain
A send_email tool may be acceptable for internal recipients but must require approval for external ones. The policy inspects the actual to argument before the call runs:
Tool Result Policies
Tool result policies control how tool output is treated after a tool runs. Use them when the tool itself may be safe to call, but the returned data could be sensitive, adversarial, or prompt-injectable.Safe
The result is considered safe and continues through the agent loop normally. Context state is unchanged.
Sensitive
The result is treated as sensitive or risky context for later decisions. Subsequent tool call policies that require a safe context will be denied.
Dual LLM
The result is routed through the Dual LLM Agent before it is returned to the main agent. The main agent only sees a constrained, safe summary — never the raw output.
Blocked
The result is blocked entirely and never enters the agent’s context. The tool call appears to have returned nothing.
Example: Classifying Emails by Sender Domain
Aread_email tool may be safe to call, but returned messages from external senders could contain prompt injections. The policy inspects the result and classifies it:
Policy Evaluation Order
Archestra evaluates policies in the order they are defined. The first matching condition determines the action taken. Policies can also be scoped to specific agents — for example, allowing an internal support agent to usesend_email for @mycompany.com recipients while keeping the same tool blocked for a broader research agent.
Deterministic vs. LLM Guardrails
Many platforms use probabilistic LLM guardrails that ask a model to decide whether content or actions are allowed. Archestra’s AI tool guardrails are different:| Property | Archestra Guardrails | LLM Guardrails |
|---|---|---|
| Decision source | Stored policies | Model judgment at runtime |
| Determinism | ✅ Fully deterministic | ❌ Probabilistic |
| Context-aware | ✅ Evaluates live context | ❌ Usually stateless |
| Auditable | ✅ Inspect exact policies applied | ❌ Opaque model output |
| Composable | ✅ Combine with Dual LLM | Limited |
Load Tools When Needed
When an agent or MCP Gateway uses Load tools when needed, the initial MCPtools/list only includes search_tools and run_tool. Tool call policies are still evaluated against the tool that actually runs.
If run_tool is asked to execute send_email, Archestra evaluates the send_email policies with the submitted tool_args, current trust state, and policy context. Input conditions, team conditions, untrusted-context rules, and approval-required rules all work the same way as a direct send_email tool call — the indirection through run_tool does not bypass guardrails.
Policy Configuration Agent
Archestra includes a built-in Tool Policy Configuration Agent that analyzes tool metadata and proposes default tool call policies and tool result policies automatically, so you do not start from a blank screen for every new tool.How the Policy Configuration Agent Works
How the Policy Configuration Agent Works
When triggered, the subagent sends each tool’s name, description, MCP server name, parameter schema, and tool annotations to an LLM. The LLM returns structured recommendations for both tool call policies and tool result policies, along with reasoning that is stored for auditability.The agent can run in two ways:
- Automatically on tool discovery. When enabled, newly discovered tools get default policies without manual review first.
- Manually on demand. Trigger it for specific tools when you want Archestra to propose defaults for an existing tool set.
Configuring Policies in the UI
Open the Guardrails View
Navigate to LLM Proxy → Tool Guardrails in the Archestra dashboard. All discovered tools appear here, grouped by discovery source (LLM Proxy traffic or MCP server).
Select a Tool
Click any tool to open its policy editor. You will see separate sections for Tool Call Policies and Tool Result Policies.
Add a Condition
For each policy, add one or more conditions that inspect arguments (for call policies) or returned fields (for result policies). Select the action to take when the condition matches.
Set the Default Action
Define a fallback action that applies when no conditions match. This is typically Allow in safe context for call policies and Safe or Sensitive for result policies.