Skip to main content

Function Signature

function shieldAnthropic<T extends {
  messages: { create: (...args: unknown[]) => unknown };
}>(client: T, options?: ShieldAnthropicOptions): T
Wraps an Anthropic client instance with Shield protection. Returns a wrapped client with the same API surface that automatically hardens system prompts, detects injections in user input, and sanitizes model output.

Parameters

client
Anthropic
required
An instance of the Anthropic SDK client (from @anthropic-ai/sdk package >= 0.20.0)
options
ShieldAnthropicOptions
Configuration options for Shield protection

ShieldAnthropicOptions

systemPrompt
string
System prompt used for sanitization. When omitted, Shield automatically derives it from the system parameter in your request.
harden
HardenOptions | false
default:"{}"
Options for prompt hardening. Set to false to disable hardening. See harden() for available options.
detect
DetectOptions | false
default:"{}"
Options for injection detection. Set to false to disable detection. See detect() for available options.
sanitize
SanitizeOptions | false
default:"{}"
Options for output sanitization. Set to false to disable sanitization. See sanitize() for available options.
streamingSanitize
'buffer' | 'chunked' | 'passthrough'
default:"'buffer'"
Streaming sanitization strategy:
  • "buffer": Accumulate the full stream, then sanitize (higher memory, more accurate)
  • "chunked": Process in 8KB chunks (lower memory for long streams)
  • "passthrough": Skip sanitization entirely (use when you accept the risk)
streamingChunkSize
number
default:"8192"
Chunk size in bytes for "chunked" mode. Only applies when streamingSanitize is set to "chunked".
onDetection
'block' | 'warn'
default:"'block'"
Behavior when injection is detected:
  • "block": Throw InjectionDetectedError (request fails)
  • "warn": Only invoke onInjectionDetected callback (request continues)
throwOnLeak
boolean
default:"false"
When true, throw LeakDetectedError instead of redacting leaked content. Use for strict security policies where any leak should abort the request.
onInjectionDetected
(result: DetectResult) => void
Callback invoked when an injection is detected. Receives the full DetectResult with risk level and matched patterns.
onLeakDetected
(result: SanitizeResult) => void
Callback invoked when a prompt leak is detected in the output. Receives the full SanitizeResult with confidence score and leaked fragments.

Return Type

Returns the same client type T with Shield protection applied. All methods work identically to the original client.

Examples

Basic Usage

import Anthropic from "@anthropic-ai/sdk";
import { shieldAnthropic } from "@zeroleaks/shield/anthropic";

const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a support agent...",
});

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  system: "You are a support agent...",
  messages: [{ role: "user", content: userInput }],
  max_tokens: 1024,
});

Streaming with Chunked Sanitization

const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a helpful assistant.",
  streamingSanitize: "chunked", // Process in 8KB chunks
  streamingChunkSize: 4096, // Use 4KB chunks
});

const stream = await client.messages.create({
  model: "claude-sonnet-4-6",
  system: "You are a helpful assistant.",
  messages: [{ role: "user", content: userInput }],
  max_tokens: 1024,
  stream: true,
});

for await (const event of stream) {
  if (event.type === "content_block_delta" && event.delta?.text) {
    process.stdout.write(event.delta.text);
  }
}

Custom Detection Callbacks

const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a helpful assistant.",
  onDetection: "warn", // Don't throw, just log
  onInjectionDetected: (result) => {
    console.warn(`Injection detected: ${result.risk} risk`);
    console.warn(`Matched patterns: ${result.matches.map(m => m.category).join(", ")}`);
  },
  onLeakDetected: (result) => {
    console.warn(`Leak detected with ${result.confidence} confidence`);
    console.warn(`Fragments: ${result.fragments.length}`);
  },
});

Strict Mode (Throw on Any Leak)

import { InjectionDetectedError, LeakDetectedError } from "@zeroleaks/shield";

const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a support agent.",
  throwOnLeak: true, // Abort request on any leak
});

try {
  const response = await client.messages.create({
    model: "claude-sonnet-4-6",
    system: "You are a support agent.",
    messages: [{ role: "user", content: userInput }],
    max_tokens: 1024,
  });
} catch (error) {
  if (error instanceof InjectionDetectedError) {
    console.error(`Injection: ${error.risk} risk, categories: ${error.categories}`);
  }
  if (error instanceof LeakDetectedError) {
    console.error(`Leak: ${error.confidence} confidence, ${error.fragmentCount} fragments`);
  }
}

Notes

  • Multi-part system prompts: Anthropic supports system as string | Array<{ type: string; text: string }>. Shield extracts text from all blocks for hardening and sanitization.
  • Multi-part messages: Message content can be string | Array<{ type: string; text: string }>. Shield extracts text from all parts for injection detection.
  • Tool use: Shield automatically sanitizes the input object in tool use blocks to prevent leaks in structured outputs.
  • Auto-derived system prompt: When systemPrompt is not provided, Shield extracts it from the system parameter in your request.

Build docs developers (and LLMs) love