PromptInject by agencyenterprise

Framework for LLM adversarial prompt attack analysis

Created 3 years ago

447 stars

Top 67.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Elvis Saravia

Founder of DAIR.AI

Project Summary

PromptInject is a framework for quantitatively analyzing the robustness of Large Language Models (LLMs) against adversarial prompt attacks. It targets researchers and developers working with LLMs in production, offering a method to identify and mitigate vulnerabilities like goal hijacking and prompt leaking. The framework provides a prosaic alignment approach for iterative adversarial prompt composition.

How It Works

PromptInject employs a mask-based iterative adversarial prompt composition technique. This method allows for the systematic construction of malicious prompts designed to manipulate LLM behavior. The framework focuses on two primary attack vectors: goal hijacking, where the LLM is steered to produce a specific target string (potentially containing malicious instructions), and prompt leaking, where the LLM is induced to reveal its original system prompt. This approach leverages the stochastic nature of LLMs to create long-tail risks.

Quick Start & Requirements

Install: pip install git+https://github.com/agencyenterprise/PromptInject
Usage: See notebooks/Example.ipynb
Requirements: Python, LLM API access (e.g., GPT-3).

Highlighted Details

Award-winning research: Best Paper Award at NeurIPS ML Safety Workshop 2022.
Focuses on practical attacks: Goal hijacking and prompt leaking.
Demonstrates exploitability of GPT-3 with simple handcrafted inputs.
Provides a quantitative analysis framework for LLM robustness.

Maintenance & Community

Contributions are welcomed via the issues tracker.
Contributing documentation is available.

Licensing & Compatibility

License: Not explicitly stated in the README.
Compatibility: Designed for LLMs like GPT-3; compatibility with other models may require adaptation.

Limitations & Caveats

The README does not specify the exact license, which could impact commercial use. While focused on GPT-3, the effectiveness and implementation details for other LLMs are not detailed.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days