shieldAnthropic()

Function Signature

function shieldAnthropic<T extends {
  messages: { create: (...args: unknown[]) => unknown };
}>(client: T, options?: ShieldAnthropicOptions): T

Wraps an Anthropic client instance with Shield protection. Returns a wrapped client with the same API surface that automatically hardens system prompts, detects injections in user input, and sanitizes model output.

Parameters

client

Anthropic

required

An instance of the Anthropic SDK client (from @anthropic-ai/sdk package >= 0.20.0)

options

ShieldAnthropicOptions

Configuration options for Shield protection

ShieldAnthropicOptions

systemPrompt

string

System prompt used for sanitization. When omitted, Shield automatically derives it from the system parameter in your request.

harden

HardenOptions | false

default:"{}"

Options for prompt hardening. Set to false to disable hardening. See harden() for available options.

detect

DetectOptions | false

default:"{}"

Options for injection detection. Set to false to disable detection. See detect() for available options.

sanitize

SanitizeOptions | false

default:"{}"

Options for output sanitization. Set to false to disable sanitization. See sanitize() for available options.

streamingSanitize

'buffer' | 'chunked' | 'passthrough'

default:"'buffer'"

Streaming sanitization strategy:

"buffer": Accumulate the full stream, then sanitize (higher memory, more accurate)
"chunked": Process in 8KB chunks (lower memory for long streams)
"passthrough": Skip sanitization entirely (use when you accept the risk)

streamingChunkSize

number

default:"8192"

Chunk size in bytes for "chunked" mode. Only applies when streamingSanitize is set to "chunked".

onDetection

'block' | 'warn'

default:"'block'"

Behavior when injection is detected:

"block": Throw InjectionDetectedError (request fails)
"warn": Only invoke onInjectionDetected callback (request continues)

throwOnLeak

boolean

default:"false"

When true, throw LeakDetectedError instead of redacting leaked content. Use for strict security policies where any leak should abort the request.

onInjectionDetected

(result: DetectResult) => void

Callback invoked when an injection is detected. Receives the full DetectResult with risk level and matched patterns.

onLeakDetected

(result: SanitizeResult) => void

Callback invoked when a prompt leak is detected in the output. Receives the full SanitizeResult with confidence score and leaked fragments.

Return Type

Returns the same client type T with Shield protection applied. All methods work identically to the original client.

Examples

Basic Usage

import Anthropic from "@anthropic-ai/sdk";
import { shieldAnthropic } from "@zeroleaks/shield/anthropic";

const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a support agent...",
});

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  system: "You are a support agent...",
  messages: [{ role: "user", content: userInput }],
  max_tokens: 1024,
});

Streaming with Chunked Sanitization

const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a helpful assistant.",
  streamingSanitize: "chunked", // Process in 8KB chunks
  streamingChunkSize: 4096, // Use 4KB chunks
});

const stream = await client.messages.create({
  model: "claude-sonnet-4-6",
  system: "You are a helpful assistant.",
  messages: [{ role: "user", content: userInput }],
  max_tokens: 1024,
  stream: true,
});

for await (const event of stream) {
  if (event.type === "content_block_delta" && event.delta?.text) {
    process.stdout.write(event.delta.text);
  }
}

Custom Detection Callbacks

const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a helpful assistant.",
  onDetection: "warn", // Don't throw, just log
  onInjectionDetected: (result) => {
    console.warn(`Injection detected: ${result.risk} risk`);
    console.warn(`Matched patterns: ${result.matches.map(m => m.category).join(", ")}`);
  },
  onLeakDetected: (result) => {
    console.warn(`Leak detected with ${result.confidence} confidence`);
    console.warn(`Fragments: ${result.fragments.length}`);
  },
});

Strict Mode (Throw on Any Leak)

import { InjectionDetectedError, LeakDetectedError } from "@zeroleaks/shield";

const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a support agent.",
  throwOnLeak: true, // Abort request on any leak
});

try {
  const response = await client.messages.create({
    model: "claude-sonnet-4-6",
    system: "You are a support agent.",
    messages: [{ role: "user", content: userInput }],
    max_tokens: 1024,
  });
} catch (error) {
  if (error instanceof InjectionDetectedError) {
    console.error(`Injection: ${error.risk} risk, categories: ${error.categories}`);
  }
  if (error instanceof LeakDetectedError) {
    console.error(`Leak: ${error.confidence} confidence, ${error.fragmentCount} fragments`);
  }
}

Notes

Multi-part system prompts: Anthropic supports system as string | Array<{ type: string; text: string }>. Shield extracts text from all blocks for hardening and sanitization.
Multi-part messages: Message content can be string | Array<{ type: string; text: string }>. Shield extracts text from all parts for injection detection.
Tool use: Shield automatically sanitizes the input object in tool use blocks to prevent leaks in structured outputs.
Auto-derived system prompt: When systemPrompt is not provided, Shield extracts it from the system parameter in your request.

Core API

Providers API

Types & Errors

shieldAnthropic()

Function Signature

Parameters

ShieldAnthropicOptions

Return Type

Examples

Basic Usage

Streaming with Chunked Sanitization

Custom Detection Callbacks

Strict Mode (Throw on Any Leak)

Notes

Build docs developers (and LLMs) love

Core API

Providers API

Types & Errors

​Function Signature

​Parameters

​ShieldAnthropicOptions

​Return Type

​Examples

​Basic Usage

​Streaming with Chunked Sanitization

​Custom Detection Callbacks

​Strict Mode (Throw on Any Leak)

​Notes

Build docs developers (and LLMs) love

Function Signature

Parameters

ShieldAnthropicOptions

Return Type

Examples

Basic Usage

Streaming with Chunked Sanitization

Custom Detection Callbacks

Strict Mode (Throw on Any Leak)

Notes