Dual LLM Agent: Quarantining Untrusted Tool Output

The Dual LLM Agent is a built-in security workflow for tools that return untrusted content. It is one of the strategies Archestra uses to reduce Lethal Trifecta risk. Instead of letting the main agent read raw output from sources like web pages, email, or user-generated files, Archestra routes that output through two built-in agents with different responsibilities. For a deeper explanation of the security pattern itself, see the Dual LLM overview.

Why Raw Tool Output Is Dangerous

When an agent reads content from an external or user-controlled source — a webpage, an email inbox, a shared document — that content may contain adversarial instructions crafted to hijack the agent’s behavior. Because LLMs process all tokens as a single stream, they have no inherent mechanism to distinguish between your system prompt, your user’s request, and injected instructions hidden inside fetched data. The Dual LLM pattern solves this by ensuring the raw, potentially poisoned content never reaches the agent that has tool access.

How It Works

The workflow uses two isolated agents with asymmetric capabilities:

Dual LLM Main Agent

Sees the user request and a Q&A transcript. Has access to tools. Never sees raw tool output. Asks constrained multiple-choice questions and synthesizes a safe summary from the answers it receives.

Dual LLM Quarantine Agent

Sees the raw tool output. Has no tool access whatsoever. Can only respond by picking from a constrained set of multiple-choice options provided by the main agent — it cannot issue tool calls or send free-form text back.

The separation limits prompt injection risk because untrusted text never reaches the agent that holds the tools. Even if the quarantine agent’s model were fully compromised by an injection, it has no mechanism to act on that — it can only return an index number.

Interaction Flow

The main agent never receives free-form text from the quarantine agent. It only receives integer indices corresponding to options it defined itself. This makes the channel structurally safe — an attacker cannot craft a response that the quarantine agent could use to influence the main agent.

When Dual LLM Runs

Dual LLM activates when a tool’s tool result policy is set to Dual LLM. The most common scenarios are:

Web Search & Scraping

Any tool that fetches or summarizes live web content where the page author could embed adversarial instructions.

Email Readers

Tools like read_email or list_messages where the email body is controlled by external, potentially malicious senders.

File & Document Readers

Tools that return user-uploaded or third-party documents where the document content is not trusted.

External API Responses

Any external source where the exact raw text is unsafe to pass to a tool-capable agent but a safe summary is still useful.

Configuration

Dual LLM is configured as a tool result policy action in the AI Tool Guardrails settings. You do not need to modify your agent prompt or tool definitions.

Open Tool Guardrails

Navigate to LLM Proxy → Tool Guardrails and select the tool whose results you want to quarantine.

Add a Tool Result Policy

In the Tool Result Policies section, add a new policy. You can apply Dual LLM unconditionally or conditionally — for example, only when the email sender is from outside your domain.

Set Action to Dual LLM

Select Dual LLM as the action for the matching condition. Save the policy.

Verify in the Audit Log

After the next agent run that triggers this tool, confirm in the audit log that the result was routed through the quarantine agent rather than passed directly to the main agent context.

The Tool Policy Configuration Agent can recommend the Dual LLM action automatically for tools that read from untrusted sources. Trigger it on newly discovered tools to get a head start on policy configuration.

Example: Conditional Dual LLM for Email

Apply Dual LLM only when emails from outside your domain are returned, and mark purely internal results as safe:

{
  "tool": "read_email",
  "result_conditions": [
    {
      "field": "emails[*].from",
      "operator": "all_match",
      "pattern": "@mycompany\\.com$",
      "action": "safe"
    },
    {
      "field": "emails[*].from",
      "operator": "any_not_match",
      "pattern": "@mycompany\\.com$",
      "action": "dual_llm"
    }
  ]
}

Limitations

Dual LLM is a strong mitigation, not an absolute guarantee. Keep the following constraints in mind when designing your security model.

Information is lossy by design

The quarantine agent can only return multiple-choice answers, not free-form summaries. If your use case requires the agent to produce rich, verbatim output from external content, Dual LLM will prevent that — it is designed to constrain, not preserve.

The quarantine model must be trusted

The quarantine agent’s underlying LLM model is still a probabilistic system. While the constrained output channel eliminates most injection vectors, the quarantine model itself must be a trustworthy, production-grade model. Do not route Dual LLM through an untrusted or fine-tuned model that could be manipulated at the model level.

Dual LLM does not replace tool call policies

Dual LLM quarantines tool results. It does not prevent an agent from calling a dangerous tool in the first place. Use tool call policies alongside Dual LLM to control which tools can run and under what context conditions.

Latency overhead

Each Dual LLM evaluation requires multiple round-trips between the main agent and quarantine agent. For latency-sensitive workflows, benchmark the overhead against your acceptable response time before enabling Dual LLM on high-frequency tools.

Get Started

MCP

Agents

LLM Proxy

Security

Administration

Integrations

Contributing

Dual LLM Agent: Quarantining Untrusted Tool Output

Why Raw Tool Output Is Dangerous

How It Works

Dual LLM Main Agent

Dual LLM Quarantine Agent

Interaction Flow

When Dual LLM Runs

Web Search & Scraping

Email Readers

File & Document Readers

External API Responses

Configuration

Example: Conditional Dual LLM for Email

Limitations

Build docs developers (and LLMs) love

Get Started

MCP

Agents

LLM Proxy

Security

Administration

Integrations

Contributing

Documentation Index

​Why Raw Tool Output Is Dangerous

​How It Works

Dual LLM Main Agent

Dual LLM Quarantine Agent

​Interaction Flow

​When Dual LLM Runs

Web Search & Scraping

Email Readers

File & Document Readers

External API Responses

​Configuration

​Example: Conditional Dual LLM for Email

​Limitations

Build docs developers (and LLMs) love

Why Raw Tool Output Is Dangerous

How It Works

Interaction Flow

When Dual LLM Runs

Configuration

Example: Conditional Dual LLM for Email

Limitations