Syntax
Description
Theprompt-firewall command analyzes text prompts sent to AI agents for potential prompt injection or jailbreak attempts. This helps detect malicious prompts that try to:
- Override system instructions
- Extract sensitive information
- Bypass safety restrictions
- Execute unauthorized commands
- Inject malicious instructions
- Monitoring AI agent interactions
- Detecting social engineering attempts
- Protecting against prompt injection attacks
- Auditing user-provided prompts
Options
Path to file containing prompt text. If not provided, reads from stdin.
Detection Patterns
The firewall detects:- System Override: Attempts to ignore or replace system instructions
- Role Manipulation: Attempts to change the AI’s role or identity
- Instruction Injection: Hidden instructions in user content
- Data Exfiltration: Attempts to extract sensitive information
- Jailbreak Patterns: Common jailbreak techniques
Examples
Analyze Prompt from Stdin
Analyze Prompt from File
Check Multiple Prompts
Integration with AI Agent Workflow
Exit Codes
- 0: No injection patterns detected
- 1: Potential injection detected (warning level)
- 2: High-confidence injection detected (block recommended)
Detection Examples
System Override
Role Manipulation
Data Exfiltration
Use Cases
AI Agent Protection
Monitor all prompts sent to AI coding agents:User Input Validation
Validate user-provided instructions:Limitations
This is a defense-in-depth measure and should be combined with other security controls.Related Commands
- validate-agent - Validate agent scripts
- scan-security - Scan code for security issues
- lockdown - Enable strict enforcement mode