Resource list for prompt injection attacks on ML models
Top 87.1% on sourcepulse
This repository serves as a curated collection of resources for understanding and mitigating prompt injection vulnerabilities in machine learning models, particularly those employing prompt-based learning. It targets AI researchers, security engineers, and developers working with LLMs, offering a centralized hub for articles, tutorials, research papers, and tools to combat this emerging threat.
How It Works
Prompt injection exploits the inability of ML models to differentiate between user-provided data and system instructions. Attackers craft malicious inputs that trick the model into executing unintended commands, potentially leading to data exfiltration, unauthorized actions, or behavioral manipulation. This collection provides insights into various attack vectors, including direct and indirect injection, and highlights techniques for detection and defense.
Quick Start & Requirements
Garak
(Python 3.x) for LLM vulnerability scanning, Token Turbulenz
(Python 3.x) for prompt injection fuzzing.Gandalf
(requires interaction with a specific LLM setup), Promptalanche
(scenario-based).Highlighted Details
Garak
for automated LLM vulnerability scanning.Gandalf
, Promptalanche
) for hands-on learning.Maintenance & Community
Learn Prompting
Discord server for community discussion.Licensing & Compatibility
Limitations & Caveats
1 year ago
Inactive