Dataset for LLM jailbreak research (CCS'24 paper)
Top 15.3% on sourcepulse
This repository provides a large-scale dataset of 15,140 "in-the-wild" prompts collected from various online platforms, including 1,405 identified jailbreak prompts. It is intended for researchers studying the security and robustness of Large Language Models (LLMs) against adversarial inputs. The dataset enables analysis of real-world prompt engineering techniques used to bypass LLM safety guidelines.
How It Works
The project collects and categorizes prompts from diverse sources like Reddit, Discord, and public datasets. It identifies and isolates jailbreak prompts, which are designed to elicit harmful or restricted content from LLMs. The dataset is structured to facilitate research into the characteristics and effectiveness of these adversarial prompts.
Quick Start & Requirements
from datasets import load_dataset
dataset = load_dataset('TrustAIRLab/in-the-wild-jailbreak-prompts', 'jailbreak_2023_12_25', split='train')
Highlighted Details
Maintenance & Community
The project is associated with the ACM CCS 2024 paper "Do Anything Now". Further community interaction details are not specified in the README.
Licensing & Compatibility
The jailbreak_llms
repository is licensed under the MIT license, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
The dataset contains examples of harmful language, and reader discretion is advised. The project is intended for research purposes only, and misuse is strictly prohibited. Preprocessing the prompt field to remove duplicates is recommended if using the dataset for model training.
7 months ago
1 day