Beyond basic prompt injection, LLMs and the infrastructure around them are vulnerable to a range of attacks — from sophisticated jailbreak chains to outright remote code execution by loading a malicious model file.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/HackTricks-wiki/hacktricks/llms.txt
Use this file to discover all available pages before exploring further.
Jailbreak techniques
Token confusion (WAF bypass)
LLM safety WAFs operate on tokenised representations of text. Because tokenisation is not the same as word splitting, a WAF trained on token sequences can be bypassed by inputs that tokenise differently but carry the same semantic meaning to the downstream LLM.ass causes the tokeniser to split assignore differently, making the WAF miss the trigger while the LLM still understands the intent.
Autocomplete prefix seeding
In IDE autocomplete contexts, code-focused models continue whatever text the user has started:Multi-step context injection
Some agentic systems reread the full conversation history before each response. An attacker who controls browsing output can append instructions that appear to be the model’s own prior content:Model RCE — loading malicious checkpoints
Machine learning models are frequently shared as files that use Python’spickle serialisation. Loading a pickle file executes arbitrary Python code.
Creating a malicious PyTorch checkpoint
Affected frameworks
| Framework | Vector | CVE |
|---|---|---|
PyTorch torch.load | Pickle in .pt/.ckpt/.pth | CVE-2025-32434 |
| TorchServe | SSRF + malicious model download | CVE-2023-43654 |
| NVIDIA Merlin Transformers4Rec | torch.load without weights_only | CVE-2025-23298 |
| TensorFlow/Keras | yaml.unsafe_load, Lambda layers | CVE-2021-37678, CVE-2024-3660 |
| Scikit-learn | joblib.load pickle | CVE-2020-13092 |
| GGML/GGUF | Heap overflows in parser | CVE-2024-25664–25668 |
| InvokeAI | /api/v2/models/install pickle | CVE-2024-12029 |
Hydra metadata → RCE (even with safetensors)
hydra.utils.instantiate() imports and calls any dotted _target_ found in model metadata. Attackers can supply this in .nemo, config.json, or the __metadata__ field of a .safetensors file — no pickle required.
Mitigations for model loading
- Prefer Safetensors or ONNX over pickle-based formats when possible
- Enforce model provenance with checksums or GPG signatures
- Sandbox deserialization with seccomp/AppArmor; run as non-root with no network egress
- Monitor for unexpected child processes spawned during model loading
Path traversal via model archives
Many model formats use.zip/.tar. Malformed archive entries can escape the extraction directory:
MCP (Model Context Protocol) security
MCP is a protocol that connects LLM agents to external tools and data sources. Attack surface includes:- Tool poisoning: a malicious MCP server returns tool descriptions that contain injected instructions to the LLM
- Privilege escalation: an agent with file-read access and a vulnerable MCP connection may be tricked into exfiltrating data to an attacker-controlled server
- Cross-MCP injection: instructions from one MCP tool affect the agent’s behaviour in another tool context
AI-assisted fuzzing
LLMs improve traditional coverage-guided fuzzing in several ways:- Structured input formats (JSON, XML, protobuf) where random mutation rarely produces valid inputs
- Protocol fuzzing where the LLM understands state machines from documentation
- Crash triage where the LLM categorises crashes by root cause