Open-Prompt-Injection by liu00222

Benchmark for prompt injection attacks and defenses in LLM apps

Created 2 years ago

372 stars

Top 76.1% on SourcePulse

Project Summary

This repository provides an open-source toolkit for benchmarking prompt injection attacks and defenses in Large Language Model (LLM) integrated applications. It enables researchers and developers to implement, evaluate, and extend various attack strategies, defense mechanisms, and LLM models, facilitating a deeper understanding and mitigation of security vulnerabilities.

How It Works

The toolkit employs a modular design, allowing users to create and configure target tasks, LLM models, and attack strategies. It supports a range of prompt injection attack types (e.g., naive, escape, ignore) and defense mechanisms, including paraphrasing, retokenization, data isolation, instructional prevention, sandwich prevention, perplexity-based detection, LLM-based detection, and known-answer detection. Users can define custom configurations for supported LLMs and tasks to simulate real-world scenarios.

Quick Start & Requirements

Install: pip install OpenPromptInjection (assuming the package is published, otherwise clone and install from source).
Prerequisites: Python 3.9.0, SciPy, NumPy, PyTorch, tqdm, Datasets, Rouge 1.0.1, google-generativeai.
API Keys: Requires API keys for supported LLMs (e.g., PaLM2). Configuration is done via JSON files in ./configs/model_configs/.
Demo: Example usage snippets are provided in the README for querying models and evaluating combined attacks.
Official Docs: Refer to the repository structure and provided code examples for detailed usage.

Highlighted Details

Implements and benchmarks various prompt injection attack and defense strategies.
Supports multiple defense mechanisms, including perplexity-based and LLM-based detection.
Includes example code for evaluating attack success rates (ASV) on specific tasks like sentiment analysis.
Mentions DataSentinel, a game-theoretic detection method, with a link to a fine-tuned checkpoint.

Maintenance & Community

The project cites two papers: "Formalizing and Benchmarking Prompt Injection Attacks and Defenses" (USENIX Security Symposium, 2024) and "DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks" (IEEE Symposium on Security and Privacy, 2025), indicating academic backing.
No specific community links (Discord, Slack) or active development signals are present in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given the academic citations and typical open-source practices for research tools, it is likely to be a permissive license, but this requires verification.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Currently, the toolkit primarily supports Google's PaLM2 model, with plans to add support for other models like Meta's Llama and OpenAI's GPT. The DataSentinel detector requires downloading a separate checkpoint.

Open-Prompt-Injection by liu00222

Explore Similar Projects

aegis by automorphic-ai

llm-security by dropbox

prompt-injection-defenses by tldrsec

awesome-prompt-injection by Joe-B-Security

ps-fuzz by prompt-security

awesome-gpt-security by cckuailong

promptmap by utkusen

rebuff by protectai

llm-security by greshake

awesome-llm-security by corca-ai

llm-guard by protectai

PurpleLlama by meta-llama