Benchmark for prompt injection attacks and defenses in LLM apps
Top 100.0% on sourcepulse
This repository provides an open-source toolkit for benchmarking prompt injection attacks and defenses in Large Language Model (LLM) integrated applications. It enables researchers and developers to implement, evaluate, and extend various attack strategies, defense mechanisms, and LLM models, facilitating a deeper understanding and mitigation of security vulnerabilities.
How It Works
The toolkit employs a modular design, allowing users to create and configure target tasks, LLM models, and attack strategies. It supports a range of prompt injection attack types (e.g., naive, escape, ignore) and defense mechanisms, including paraphrasing, retokenization, data isolation, instructional prevention, sandwich prevention, perplexity-based detection, LLM-based detection, and known-answer detection. Users can define custom configurations for supported LLMs and tasks to simulate real-world scenarios.
Quick Start & Requirements
pip install OpenPromptInjection
(assuming the package is published, otherwise clone and install from source)../configs/model_configs/
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Currently, the toolkit primarily supports Google's PaLM2 model, with plans to add support for other models like Meta's Llama and OpenAI's GPT. The DataSentinel detector requires downloading a separate checkpoint.
2 weeks ago
1 day