PromptInject  by agencyenterprise

Framework for LLM adversarial prompt attack analysis

created 2 years ago
399 stars

Top 73.5% on sourcepulse

GitHubView on GitHub
Project Summary

PromptInject is a framework for quantitatively analyzing the robustness of Large Language Models (LLMs) against adversarial prompt attacks. It targets researchers and developers working with LLMs in production, offering a method to identify and mitigate vulnerabilities like goal hijacking and prompt leaking. The framework provides a prosaic alignment approach for iterative adversarial prompt composition.

How It Works

PromptInject employs a mask-based iterative adversarial prompt composition technique. This method allows for the systematic construction of malicious prompts designed to manipulate LLM behavior. The framework focuses on two primary attack vectors: goal hijacking, where the LLM is steered to produce a specific target string (potentially containing malicious instructions), and prompt leaking, where the LLM is induced to reveal its original system prompt. This approach leverages the stochastic nature of LLMs to create long-tail risks.

Quick Start & Requirements

  • Install: pip install git+https://github.com/agencyenterprise/PromptInject
  • Usage: See notebooks/Example.ipynb
  • Requirements: Python, LLM API access (e.g., GPT-3).

Highlighted Details

  • Award-winning research: Best Paper Award at NeurIPS ML Safety Workshop 2022.
  • Focuses on practical attacks: Goal hijacking and prompt leaking.
  • Demonstrates exploitability of GPT-3 with simple handcrafted inputs.
  • Provides a quantitative analysis framework for LLM robustness.

Maintenance & Community

  • Contributions are welcomed via the issues tracker.
  • Contributing documentation is available.

Licensing & Compatibility

  • License: Not explicitly stated in the README.
  • Compatibility: Designed for LLMs like GPT-3; compatibility with other models may require adaptation.

Limitations & Caveats

The README does not specify the exact license, which could impact commercial use. While focused on GPT-3, the effectiveness and implementation details for other LLMs are not detailed.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
40 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), and
2 more.

llm-security by greshake

0.2%
2k
Research paper on indirect prompt injection attacks targeting app-integrated LLMs
created 2 years ago
updated 2 weeks ago
Feedback? Help us improve.