Core Types
HardenOptions
Configuration options for theharden() function.
Skip persona-binding rule that prevents role hijacking
Skip anti-extraction rules that prevent prompt extraction attempts
Additional custom security rules to inject into the system prompt
Where to add security rules in the prompt (beginning or end)
DetectOptions
Configuration options for thedetect() and detectAsync() functions.
Minimum risk level to flag as detected. Only injections at or above this risk level will be flagged.
Custom detection patterns to add to the built-in pattern library. Each pattern requires:
category: Name for the injection categoryregex: Regular expression to matchrisk: Risk level (“low”, “medium”, “high”, or “critical”)
Categories to skip during detection. Use
["social_engineering"] to allow phrases like “for research purposes only” in legitimate contexts.Whitelist phrases (case-insensitive). If input contains one of these phrases, detection is suppressed. Use sparingly for known-benign strings.
Optional async verifier function. When detection fires, this function is called with the input and initial result. Return
{ detected: false } to override the detection (e.g., after LLM verification), or null to keep the original result. Only available with detectAsync().Maximum input length to scan. Input beyond this length will be truncated.
DetectResult
Return value fromdetect() and detectAsync() functions.
Whether a prompt injection was detected
Highest risk level among all detected patterns. “none” if no injection was detected.
Array of all matched injection patterns. Each match includes:
category: The injection category (e.g., “instruction_override”, “role_hijack”)pattern: Truncated regex pattern that matched (first 60 characters)confidence: Confidence score for the match (0-1)
SanitizeOptions
Configuration options for thesanitize() and sanitizeObject() functions.
N-gram window size for matching leaked prompt fragments. Larger values reduce false positives but may miss shorter leaks.
Confidence threshold (0-1) for classifying output as leaked. Lower values are more sensitive but may produce false positives.
Jaccard word overlap threshold for detecting paraphrased leaks. Higher values require more overlap to flag as leaked.
Replacement text for redacted leaked content
When
true, only detect leaks without redacting them. The sanitized field will match the original output.SanitizeResult
Return value from thesanitize() function.
Whether a system prompt leak was detected in the output
Confidence score (0-1) for the leak detection. Higher values indicate stronger evidence of leakage.
Array of detected leaked fragments from the system prompt
Output with leaked fragments redacted (or original output if
detectOnly: true)Provider Types
Configuration options for provider wrapper functions. All providers share a common set of options.ShieldOpenAIOptions
Options forshieldOpenAI().
System prompt for sanitization. When omitted, automatically derived from the first system message in the request.
Options for hardening system prompts. Set to
false to disable hardening.Options for injection detection. Set to
false to disable detection.Options for output sanitization. Set to
false to disable sanitization.Sanitization strategy for streaming responses:
"buffer": Buffer full response then sanitize (highest accuracy)"chunked": Sanitize in 8KB chunks (lower memory for long streams)"passthrough": Skip sanitization (fastest but no leak protection)
Chunk size in bytes for
"chunked" streaming modeBehavior when injection is detected:
"block": ThrowInjectionDetectedError(recommended)"warn": Only invokeonInjectionDetectedcallback without blocking
When
true, throw LeakDetectedError instead of redacting leaked contentCallback invoked when injection is detected. Receives the full
DetectResult.Callback invoked when a leak is detected in model output. Receives the full
SanitizeResult.ShieldAnthropicOptions
Options forshieldAnthropic(). Identical to ShieldOpenAIOptions.
ShieldGroqOptions
Options forshieldGroq(). Identical to ShieldOpenAIOptions.
ShieldAISdkOptions
Options forshieldMiddleware() and shieldLanguageModelMiddleware().