Skip to main content

Anthropic Provider

The shieldAnthropic wrapper protects Anthropic clients by hardening system prompts, detecting injections in user input, and sanitizing model output to prevent leaks.

Installation

npm install @zeroleaks/shield @anthropic-ai/sdk

Quick Start

import Anthropic from "@anthropic-ai/sdk";
import { shieldAnthropic } from "@zeroleaks/shield/anthropic";

const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a support agent...",
});

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  system: "You are a support agent...",
  messages: [{ role: "user", content: userInput }],
  max_tokens: 1024,
});

Configuration Options

Basic Options

systemPrompt
string
System prompt for sanitization. When omitted, derived from the system parameter in the request.
onDetection
'block' | 'warn'
default:"block"
  • "block": Throws InjectionDetectedError when injection is detected
  • "warn": Only invokes onInjectionDetected callback without blocking
throwOnLeak
boolean
default:false
When true, throws LeakDetectedError instead of redacting leaked content.

Feature Flags

harden
HardenOptions | false
Options for system prompt hardening. Set to false to disable hardening entirely.
detect
DetectOptions | false
Options for injection detection. Set to false to disable detection entirely.
sanitize
SanitizeOptions | false
Options for output sanitization. Set to false to disable sanitization entirely.

Streaming Options

streamingSanitize
'buffer' | 'chunked' | 'passthrough'
default:"buffer"
Controls how streaming responses are sanitized:
  • "buffer": Accumulates full response then sanitizes (higher memory, more accurate)
  • "chunked": Sanitizes in 8KB chunks (lower memory for long streams)
  • "passthrough": Skip sanitization for streams (use when you accept the risk)
streamingChunkSize
number
default:8192
Chunk size in bytes for "chunked" mode.

Callbacks

onInjectionDetected
(result: DetectResult) => void
Invoked when injection is detected. Receives detection result with risk level and matched patterns.
onLeakDetected
(result: SanitizeResult) => void
Invoked when a prompt leak is detected in the output. Receives sanitization result with confidence score.

Streaming Support

Shield automatically handles both regular and streaming responses:
const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  system: "You are a helpful assistant.",
  messages: [{ role: "user", content: userInput }],
  max_tokens: 1024,
});

console.log(response.content[0].text);
// Automatically sanitized

Multi-part System Prompts

Anthropic supports system prompts as strings or arrays of text blocks. Shield handles both formats:
const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  system: [
    { type: "text", text: "You are a helpful assistant." },
    { type: "text", text: "You specialize in technical support." },
  ],
  messages: [{ role: "user", content: userInput }],
  max_tokens: 1024,
});
// All text blocks are hardened and used for sanitization

Multi-part Messages

Shield extracts text from multi-part messages for injection detection:
const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  system: "You are a helpful assistant.",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        { type: "image", source: { type: "url", url: "https://..." } },
      ],
    },
  ],
  max_tokens: 1024,
});
// Text parts are scanned for injection, images are passed through

Tool Use

Shield automatically sanitizes tool use inputs to prevent prompt leakage through tool parameters:
const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  system: "You are a helpful assistant.",
  messages: [{ role: "user", content: userInput }],
  max_tokens: 1024,
  tools: [
    {
      name: "send_email",
      description: "Send an email",
      input_schema: {
        type: "object",
        properties: {
          to: { type: "string" },
          subject: { type: "string" },
          body: { type: "string" },
        },
      },
    },
  ],
});

const toolBlock = response.content.find((block) => block.type === "tool_use");
if (toolBlock && toolBlock.type === "tool_use") {
  // toolBlock.input is automatically sanitized
  console.log(toolBlock.input);
}

Error Handling

import {
  shieldAnthropic,
  InjectionDetectedError,
  LeakDetectedError,
} from "@zeroleaks/shield/anthropic";

try {
  const client = shieldAnthropic(new Anthropic(), {
    systemPrompt: "You are a helpful assistant.",
    throwOnLeak: true,
  });

  const response = await client.messages.create({
    model: "claude-sonnet-4-6",
    system: "You are a helpful assistant.",
    messages: [{ role: "user", content: userInput }],
    max_tokens: 1024,
  });
} catch (error) {
  if (error instanceof InjectionDetectedError) {
    console.error(`Injection detected: ${error.risk} risk`);
    console.error(`Categories: ${error.categories.join(", ")}`);
  }
  if (error instanceof LeakDetectedError) {
    console.error(`Leak detected: ${error.confidence} confidence`);
    console.error(`Fragments: ${error.fragmentCount}`);
  }
}

Advanced Usage

const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a helpful assistant.",
  detect: {
    threshold: "high",
    customPatterns: [
      {
        category: "custom_command",
        regex: /execute order \d+/i,
        risk: "high",
      },
    ],
  },
});

Build docs developers (and LLMs) love