llm-security by greshake

Research paper on indirect prompt injection attacks targeting app-integrated LLMs

Created 2 years ago

1,999 stars

Top 22.0% on SourcePulse

View on GitHub

5 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Carol Willing

Core Contributor to CPython, Jupyter

and 1 more!

Project Summary

This repository demonstrates novel "indirect prompt injection" attack vectors targeting application-integrated Large Language Models (LLMs). It provides proof-of-concept code for researchers and security professionals to understand and mitigate risks associated with LLMs interacting with external data sources and applications, such as code completion engines and chat interfaces.

How It Works

The project showcases how malicious content, often hidden in side-channels like markdown comments or retrieved data, can manipulate LLMs into executing unintended actions. This includes exfiltrating user data, spreading injections to other LLMs, achieving persistent compromise across sessions, and enabling remote control of LLM agents. The core mechanism involves exploiting the LLM's retrieval and execution capabilities when connected to external tools or data.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Run demos: python scenarios/main.py
Requires an OpenAI API key set as the OPENAI_API_KEY environment variable.
Demos cover GPT-3/LangChain, GPT-4, and code completion engines (requiring IDE integration).
Official paper: https://arxiv.org/abs/2302.12173

Highlighted Details

Demonstrates attacks on GPT-4 (Bing Chat), GPT-3, LangChain apps, and code completion engines like Copilot.
Shows prompt injections can be as powerful as arbitrary code execution.
Proof-of-concept for remote control, data exfiltration, persistent compromise, and cross-LLM injection spread.
Attacks can be delivered via seemingly innocuous content like website markdown or email messages.

Maintenance & Community

The project is associated with authors from major research institutions, indicating a strong academic backing. No specific community channels (Discord/Slack) or ongoing maintenance signals are provided in the README.

Licensing & Compatibility

The repository content is licensed under the terms of the arXiv.org license, which grants a perpetual, non-exclusive license. This generally permits academic use and redistribution, but specific commercial use or closed-source linking compatibility would require further review of the underlying research paper's copyright and any associated licenses for the code itself.

Limitations & Caveats

The demonstrations are powered by OpenAI's models and LangChain, implying reliance on these specific ecosystems. The README notes that attacks need to be tried in an IDE for code completion scenarios, and some methods may require further research for robustness in real-world applications.

Health Check

Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

10 stars in the last 30 days