Open-Prompt-Injection  by liu00222

Benchmark for prompt injection attacks and defenses in LLM apps

created 1 year ago
250 stars

Top 100.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an open-source toolkit for benchmarking prompt injection attacks and defenses in Large Language Model (LLM) integrated applications. It enables researchers and developers to implement, evaluate, and extend various attack strategies, defense mechanisms, and LLM models, facilitating a deeper understanding and mitigation of security vulnerabilities.

How It Works

The toolkit employs a modular design, allowing users to create and configure target tasks, LLM models, and attack strategies. It supports a range of prompt injection attack types (e.g., naive, escape, ignore) and defense mechanisms, including paraphrasing, retokenization, data isolation, instructional prevention, sandwich prevention, perplexity-based detection, LLM-based detection, and known-answer detection. Users can define custom configurations for supported LLMs and tasks to simulate real-world scenarios.

Quick Start & Requirements

  • Install: pip install OpenPromptInjection (assuming the package is published, otherwise clone and install from source).
  • Prerequisites: Python 3.9.0, SciPy, NumPy, PyTorch, tqdm, Datasets, Rouge 1.0.1, google-generativeai.
  • API Keys: Requires API keys for supported LLMs (e.g., PaLM2). Configuration is done via JSON files in ./configs/model_configs/.
  • Demo: Example usage snippets are provided in the README for querying models and evaluating combined attacks.
  • Official Docs: Refer to the repository structure and provided code examples for detailed usage.

Highlighted Details

  • Implements and benchmarks various prompt injection attack and defense strategies.
  • Supports multiple defense mechanisms, including perplexity-based and LLM-based detection.
  • Includes example code for evaluating attack success rates (ASV) on specific tasks like sentiment analysis.
  • Mentions DataSentinel, a game-theoretic detection method, with a link to a fine-tuned checkpoint.

Maintenance & Community

  • The project cites two papers: "Formalizing and Benchmarking Prompt Injection Attacks and Defenses" (USENIX Security Symposium, 2024) and "DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks" (IEEE Symposium on Security and Privacy, 2025), indicating academic backing.
  • No specific community links (Discord, Slack) or active development signals are present in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. Given the academic citations and typical open-source practices for research tools, it is likely to be a permissive license, but this requires verification.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Currently, the toolkit primarily supports Google's PaLM2 model, with plans to add support for other models like Meta's Llama and OpenAI's GPT. The DataSentinel detector requires downloading a separate checkpoint.

Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
62 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Michele Castata Michele Castata(President of Replit), and
2 more.

rebuff by protectai

0.4%
1k
SDK for LLM prompt injection detection
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), and
2 more.

llm-security by greshake

0.2%
2k
Research paper on indirect prompt injection attacks targeting app-integrated LLMs
created 2 years ago
updated 2 weeks ago
Feedback? Help us improve.