llm-security  by greshake

Research paper on indirect prompt injection attacks targeting app-integrated LLMs

Created 2 years ago
1,992 stars

Top 22.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository demonstrates novel "indirect prompt injection" attack vectors targeting application-integrated Large Language Models (LLMs). It provides proof-of-concept code for researchers and security professionals to understand and mitigate risks associated with LLMs interacting with external data sources and applications, such as code completion engines and chat interfaces.

How It Works

The project showcases how malicious content, often hidden in side-channels like markdown comments or retrieved data, can manipulate LLMs into executing unintended actions. This includes exfiltrating user data, spreading injections to other LLMs, achieving persistent compromise across sessions, and enabling remote control of LLM agents. The core mechanism involves exploiting the LLM's retrieval and execution capabilities when connected to external tools or data.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Run demos: python scenarios/main.py
  • Requires an OpenAI API key set as the OPENAI_API_KEY environment variable.
  • Demos cover GPT-3/LangChain, GPT-4, and code completion engines (requiring IDE integration).
  • Official paper: https://arxiv.org/abs/2302.12173

Highlighted Details

  • Demonstrates attacks on GPT-4 (Bing Chat), GPT-3, LangChain apps, and code completion engines like Copilot.
  • Shows prompt injections can be as powerful as arbitrary code execution.
  • Proof-of-concept for remote control, data exfiltration, persistent compromise, and cross-LLM injection spread.
  • Attacks can be delivered via seemingly innocuous content like website markdown or email messages.

Maintenance & Community

The project is associated with authors from major research institutions, indicating a strong academic backing. No specific community channels (Discord/Slack) or ongoing maintenance signals are provided in the README.

Licensing & Compatibility

The repository content is licensed under the terms of the arXiv.org license, which grants a perpetual, non-exclusive license. This generally permits academic use and redistribution, but specific commercial use or closed-source linking compatibility would require further review of the underlying research paper's copyright and any associated licenses for the code itself.

Limitations & Caveats

The demonstrations are powered by OpenAI's models and LangChain, implying reliance on these specific ecosystems. The README notes that attacks need to be tried in an IDE for code completion scenarios, and some methods may require further research for robustness in real-world applications.

Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Michele Castata Michele Castata(President of Replit), and
3 more.

rebuff by protectai

0.4%
1k
SDK for LLM prompt injection detection
Created 2 years ago
Updated 1 year ago
Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

PurpleLlama by meta-llama

0.6%
4k
LLM security toolkit for assessing/improving generative AI models
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.