llm-security  by dropbox

LLM security research code for prompt injection

Created 2 years ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This repository from Dropbox offers research code and findings on sophisticated prompt injection attacks against Large Language Models (LLMs), particularly focusing on OpenAI's ChatGPT models. It details novel techniques employing repeated tokens and character sequences to bypass model instructions, induce unintended behaviors like hallucinations, and extract sensitive memorized training data. Intended for security researchers and developers, this project provides practical demonstrations to understand and build defenses against these emerging LLM vulnerabilities.

How It Works

The core methodology leverages the insertion of carefully crafted repeated sequences of tokens or characters within LLM prompts. These repetitions, especially multi-token sequences, exploit vulnerabilities to circumvent prompt template instructions and content constraints. This can lead to significant model destabilization, manifesting as jailbreaks, irrelevant responses (hallucinations), or, critically, the leakage of proprietary training data via divergence attacks. The research systematically quantifies the "blackout" effect, measuring how many repetitions are needed for a model to ignore earlier prompt instructions.

Quick Start & Requirements

To utilize this repository, clone it and navigate to the src directory. A Python 3 environment is necessary. Users must configure their OpenAI API key by setting the OPENAI_API_KEY environment variable. The provided demonstration scripts (repeated-tokens.py, question-with-context.py, repeated-sequences.py) can then be executed, allowing selection of specific OpenAI models (e.g., gpt-3.5-turbo, gpt-4) and operational modes.

Highlighted Details

  • Multi-token sequence repetitions have proven effective for prompt injection and data extraction, bypassing OpenAI's filters that previously mitigated single-token repetition attacks.
  • The repeated-tokens.py script provides concrete examples of triggering divergence attacks to exfiltrate memorized training data from models like GPT-3.5 and GPT-4.
  • repeated-sequences.py offers quantitative analysis of the "blackout" effect, detailing how various control and space-character sequences impact LLM context retention and instruction following.
  • The research suggests that statistical analysis of character frequencies, rather than solely relying on specific sequence detection, may represent a more robust strategy for identifying and preventing these types of prompt injection attacks.

Maintenance & Community

This repository appears to be a static research artifact, with no explicit mention of ongoing maintenance, community forums (like Discord or Slack), or a public roadmap. It acknowledges contributions from internal and external Dropbox collaborators.

Licensing & Compatibility

The project is licensed under the permissive Apache License, Version 2.0. This license permits broad use, modification, and distribution, including for commercial purposes, provided the terms of the license are adhered to.

Limitations & Caveats

The effectiveness of the demonstrated attacks is contingent upon the current state of OpenAI's filtering mechanisms, which have evolved and may continue to change, potentially rendering some techniques obsolete. The success rate can also vary significantly based on the specific LLM version, the complexity of the prompt, and the precise nature of the repeated sequences employed.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Michele Castata Michele Castata(President of Replit), and
3 more.

rebuff by protectai

0.1%
1k
SDK for LLM prompt injection detection
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
3 more.

llm-guard by protectai

0.9%
2k
Security toolkit for LLM interactions
Created 2 years ago
Updated 3 weeks ago
Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

PurpleLlama by meta-llama

0.3%
4k
LLM security toolkit for assessing/improving generative AI models
Created 2 years ago
Updated 2 days ago
Feedback? Help us improve.