Anthropic Provider
The shieldAnthropic wrapper protects Anthropic clients by hardening system prompts, detecting injections in user input, and sanitizing model output to prevent leaks.
Installation
npm install @zeroleaks/shield @anthropic-ai/sdk
Quick Start
import Anthropic from "@anthropic-ai/sdk";
import { shieldAnthropic } from "@zeroleaks/shield/anthropic";
const client = shieldAnthropic(new Anthropic(), {
systemPrompt: "You are a support agent...",
});
const response = await client.messages.create({
model: "claude-sonnet-4-6",
system: "You are a support agent...",
messages: [{ role: "user", content: userInput }],
max_tokens: 1024,
});
Configuration Options
Basic Options
System prompt for sanitization. When omitted, derived from the system parameter in the request.
onDetection
'block' | 'warn'
default:"block"
"block": Throws InjectionDetectedError when injection is detected
"warn": Only invokes onInjectionDetected callback without blocking
When true, throws LeakDetectedError instead of redacting leaked content.
Feature Flags
Options for system prompt hardening. Set to false to disable hardening entirely.
Options for injection detection. Set to false to disable detection entirely.
Options for output sanitization. Set to false to disable sanitization entirely.
Streaming Options
streamingSanitize
'buffer' | 'chunked' | 'passthrough'
default:"buffer"
Controls how streaming responses are sanitized:
"buffer": Accumulates full response then sanitizes (higher memory, more accurate)
"chunked": Sanitizes in 8KB chunks (lower memory for long streams)
"passthrough": Skip sanitization for streams (use when you accept the risk)
Chunk size in bytes for "chunked" mode.
Callbacks
onInjectionDetected
(result: DetectResult) => void
Invoked when injection is detected. Receives detection result with risk level and matched patterns.
onLeakDetected
(result: SanitizeResult) => void
Invoked when a prompt leak is detected in the output. Receives sanitization result with confidence score.
Streaming Support
Shield automatically handles both regular and streaming responses:
Regular Response
Streaming Response
Chunked Sanitization
const response = await client.messages.create({
model: "claude-sonnet-4-6",
system: "You are a helpful assistant.",
messages: [{ role: "user", content: userInput }],
max_tokens: 1024,
});
console.log(response.content[0].text);
// Automatically sanitized
const stream = await client.messages.create({
model: "claude-sonnet-4-6",
system: "You are a helpful assistant.",
messages: [{ role: "user", content: userInput }],
max_tokens: 1024,
stream: true,
});
for await (const event of stream) {
if (event.type === "content_block_delta" && event.delta?.text) {
process.stdout.write(event.delta.text);
// Automatically sanitized chunks
}
}
// For very long streams, use chunked mode to limit memory
const client = shieldAnthropic(new Anthropic(), {
systemPrompt: "You are a helpful assistant.",
streamingSanitize: "chunked",
streamingChunkSize: 8192, // 8KB chunks
});
const stream = await client.messages.create({
model: "claude-sonnet-4-6",
system: "You are a helpful assistant.",
messages: [{ role: "user", content: userInput }],
max_tokens: 1024,
stream: true,
});
for await (const event of stream) {
if (event.type === "content_block_delta" && event.delta?.text) {
process.stdout.write(event.delta.text);
}
}
Multi-part System Prompts
Anthropic supports system prompts as strings or arrays of text blocks. Shield handles both formats:
const response = await client.messages.create({
model: "claude-sonnet-4-6",
system: [
{ type: "text", text: "You are a helpful assistant." },
{ type: "text", text: "You specialize in technical support." },
],
messages: [{ role: "user", content: userInput }],
max_tokens: 1024,
});
// All text blocks are hardened and used for sanitization
Multi-part Messages
Shield extracts text from multi-part messages for injection detection:
const response = await client.messages.create({
model: "claude-sonnet-4-6",
system: "You are a helpful assistant.",
messages: [
{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{ type: "image", source: { type: "url", url: "https://..." } },
],
},
],
max_tokens: 1024,
});
// Text parts are scanned for injection, images are passed through
Shield automatically sanitizes tool use inputs to prevent prompt leakage through tool parameters:
const response = await client.messages.create({
model: "claude-sonnet-4-6",
system: "You are a helpful assistant.",
messages: [{ role: "user", content: userInput }],
max_tokens: 1024,
tools: [
{
name: "send_email",
description: "Send an email",
input_schema: {
type: "object",
properties: {
to: { type: "string" },
subject: { type: "string" },
body: { type: "string" },
},
},
},
],
});
const toolBlock = response.content.find((block) => block.type === "tool_use");
if (toolBlock && toolBlock.type === "tool_use") {
// toolBlock.input is automatically sanitized
console.log(toolBlock.input);
}
Error Handling
import {
shieldAnthropic,
InjectionDetectedError,
LeakDetectedError,
} from "@zeroleaks/shield/anthropic";
try {
const client = shieldAnthropic(new Anthropic(), {
systemPrompt: "You are a helpful assistant.",
throwOnLeak: true,
});
const response = await client.messages.create({
model: "claude-sonnet-4-6",
system: "You are a helpful assistant.",
messages: [{ role: "user", content: userInput }],
max_tokens: 1024,
});
} catch (error) {
if (error instanceof InjectionDetectedError) {
console.error(`Injection detected: ${error.risk} risk`);
console.error(`Categories: ${error.categories.join(", ")}`);
}
if (error instanceof LeakDetectedError) {
console.error(`Leak detected: ${error.confidence} confidence`);
console.error(`Fragments: ${error.fragmentCount}`);
}
}
Advanced Usage
const client = shieldAnthropic(new Anthropic(), {
systemPrompt: "You are a helpful assistant.",
detect: {
threshold: "high",
customPatterns: [
{
category: "custom_command",
regex: /execute order \d+/i,
risk: "high",
},
],
},
});
const client = shieldAnthropic(new Anthropic(), {
systemPrompt: "You are a financial advisor.",
harden: {
customRules: [
"Never share specific investment recommendations.",
"Always include risk disclaimers.",
],
position: "prepend",
},
});
const client = shieldAnthropic(new Anthropic(), {
systemPrompt: "You are a helpful assistant.",
onInjectionDetected: (result) => {
console.warn(`Injection attempt blocked:`, {
risk: result.risk,
categories: result.matches.map((m) => m.category),
timestamp: new Date().toISOString(),
});
},
onLeakDetected: (result) => {
console.warn(`Prompt leak detected:`, {
confidence: result.confidence,
fragmentCount: result.fragments.length,
timestamp: new Date().toISOString(),
});
},
});