Anthropic Provider

The shieldAnthropic wrapper protects Anthropic clients by hardening system prompts, detecting injections in user input, and sanitizing model output to prevent leaks.

Installation

npm install @zeroleaks/shield @anthropic-ai/sdk

Quick Start

import Anthropic from "@anthropic-ai/sdk";
import { shieldAnthropic } from "@zeroleaks/shield/anthropic";

const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a support agent...",
});

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  system: "You are a support agent...",
  messages: [{ role: "user", content: userInput }],
  max_tokens: 1024,
});

Configuration Options

Basic Options

systemPrompt

string

System prompt for sanitization. When omitted, derived from the system parameter in the request.

onDetection

'block' | 'warn'

default:"block"

"block": Throws InjectionDetectedError when injection is detected
"warn": Only invokes onInjectionDetected callback without blocking

throwOnLeak

boolean

default:false

When true, throws LeakDetectedError instead of redacting leaked content.

Feature Flags

harden

HardenOptions | false

Options for system prompt hardening. Set to false to disable hardening entirely.

detect

DetectOptions | false

Options for injection detection. Set to false to disable detection entirely.

sanitize

SanitizeOptions | false

Options for output sanitization. Set to false to disable sanitization entirely.

Streaming Options

streamingSanitize

'buffer' | 'chunked' | 'passthrough'

default:"buffer"

Controls how streaming responses are sanitized:

"buffer": Accumulates full response then sanitizes (higher memory, more accurate)
"chunked": Sanitizes in 8KB chunks (lower memory for long streams)
"passthrough": Skip sanitization for streams (use when you accept the risk)

streamingChunkSize

number

default:8192

Chunk size in bytes for "chunked" mode.

Callbacks

onInjectionDetected

(result: DetectResult) => void

Invoked when injection is detected. Receives detection result with risk level and matched patterns.

onLeakDetected

(result: SanitizeResult) => void

Invoked when a prompt leak is detected in the output. Receives sanitization result with confidence score.

Streaming Support

Shield automatically handles both regular and streaming responses:

Regular Response
Streaming Response
Chunked Sanitization

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  system: "You are a helpful assistant.",
  messages: [{ role: "user", content: userInput }],
  max_tokens: 1024,
});

console.log(response.content[0].text);
// Automatically sanitized

const stream = await client.messages.create({
  model: "claude-sonnet-4-6",
  system: "You are a helpful assistant.",
  messages: [{ role: "user", content: userInput }],
  max_tokens: 1024,
  stream: true,
});

for await (const event of stream) {
  if (event.type === "content_block_delta" && event.delta?.text) {
    process.stdout.write(event.delta.text);
    // Automatically sanitized chunks
  }
}

// For very long streams, use chunked mode to limit memory
const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a helpful assistant.",
  streamingSanitize: "chunked",
  streamingChunkSize: 8192, // 8KB chunks
});

const stream = await client.messages.create({
  model: "claude-sonnet-4-6",
  system: "You are a helpful assistant.",
  messages: [{ role: "user", content: userInput }],
  max_tokens: 1024,
  stream: true,
});

for await (const event of stream) {
  if (event.type === "content_block_delta" && event.delta?.text) {
    process.stdout.write(event.delta.text);
  }
}

Multi-part System Prompts

Anthropic supports system prompts as strings or arrays of text blocks. Shield handles both formats:

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  system: [
    { type: "text", text: "You are a helpful assistant." },
    { type: "text", text: "You specialize in technical support." },
  ],
  messages: [{ role: "user", content: userInput }],
  max_tokens: 1024,
});
// All text blocks are hardened and used for sanitization

Multi-part Messages

Shield extracts text from multi-part messages for injection detection:

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  system: "You are a helpful assistant.",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        { type: "image", source: { type: "url", url: "https://..." } },
      ],
    },
  ],
  max_tokens: 1024,
});
// Text parts are scanned for injection, images are passed through

Tool Use

Shield automatically sanitizes tool use inputs to prevent prompt leakage through tool parameters:

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  system: "You are a helpful assistant.",
  messages: [{ role: "user", content: userInput }],
  max_tokens: 1024,
  tools: [
    {
      name: "send_email",
      description: "Send an email",
      input_schema: {
        type: "object",
        properties: {
          to: { type: "string" },
          subject: { type: "string" },
          body: { type: "string" },
        },
      },
    },
  ],
});

const toolBlock = response.content.find((block) => block.type === "tool_use");
if (toolBlock && toolBlock.type === "tool_use") {
  // toolBlock.input is automatically sanitized
  console.log(toolBlock.input);
}

Error Handling

import {
  shieldAnthropic,
  InjectionDetectedError,
  LeakDetectedError,
} from "@zeroleaks/shield/anthropic";

try {
  const client = shieldAnthropic(new Anthropic(), {
    systemPrompt: "You are a helpful assistant.",
    throwOnLeak: true,
  });

  const response = await client.messages.create({
    model: "claude-sonnet-4-6",
    system: "You are a helpful assistant.",
    messages: [{ role: "user", content: userInput }],
    max_tokens: 1024,
  });
} catch (error) {
  if (error instanceof InjectionDetectedError) {
    console.error(`Injection detected: ${error.risk} risk`);
    console.error(`Categories: ${error.categories.join(", ")}`);
  }
  if (error instanceof LeakDetectedError) {
    console.error(`Leak detected: ${error.confidence} confidence`);
    console.error(`Fragments: ${error.fragmentCount}`);
  }
}

Advanced Usage

Custom Detection Patterns
Custom Hardening Rules
Monitoring & Logging

const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a helpful assistant.",
  detect: {
    threshold: "high",
    customPatterns: [
      {
        category: "custom_command",
        regex: /execute order \d+/i,
        risk: "high",
      },
    ],
  },
});

const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a financial advisor.",
  harden: {
    customRules: [
      "Never share specific investment recommendations.",
      "Always include risk disclaimers.",
    ],
    position: "prepend",
  },
});

const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a helpful assistant.",
  onInjectionDetected: (result) => {
    console.warn(`Injection attempt blocked:`, {
      risk: result.risk,
      categories: result.matches.map((m) => m.category),
      timestamp: new Date().toISOString(),
    });
  },
  onLeakDetected: (result) => {
    console.warn(`Prompt leak detected:`, {
      confidence: result.confidence,
      fragmentCount: result.fragments.length,
      timestamp: new Date().toISOString(),
    });
  },
});

Core API - Standalone functions for custom integrations
OpenAI Provider - OpenAI SDK integration
Groq Provider - Groq SDK integration
Vercel AI SDK - Universal AI SDK integration

Get Started

Core Functions

Provider Integrations

Advanced

Anthropic

Anthropic Provider

Installation

Quick Start

Configuration Options

Basic Options

Feature Flags

Streaming Options

Callbacks

Streaming Support

Multi-part System Prompts

Multi-part Messages

Tool Use

Error Handling

Advanced Usage

Build docs developers (and LLMs) love

Get Started

Core Functions

Provider Integrations

Advanced

​Anthropic Provider

​Installation

​Quick Start

​Configuration Options

​Basic Options

​Feature Flags

​Streaming Options

​Callbacks

​Streaming Support

​Multi-part System Prompts

​Multi-part Messages

​Tool Use

​Error Handling

​Advanced Usage

​Related

Build docs developers (and LLMs) love

Anthropic Provider

Installation

Quick Start

Configuration Options

Basic Options

Feature Flags

Streaming Options

Callbacks

Streaming Support

Multi-part System Prompts

Multi-part Messages

Tool Use

Error Handling

Advanced Usage

Related