jailbreak_llms  by verazuo

Dataset for LLM jailbreak research (CCS'24 paper)

Created 2 years ago
3,346 stars

Top 14.4% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a large-scale dataset of 15,140 "in-the-wild" prompts collected from various online platforms, including 1,405 identified jailbreak prompts. It is intended for researchers studying the security and robustness of Large Language Models (LLMs) against adversarial inputs. The dataset enables analysis of real-world prompt engineering techniques used to bypass LLM safety guidelines.

How It Works

The project collects and categorizes prompts from diverse sources like Reddit, Discord, and public datasets. It identifies and isolates jailbreak prompts, which are designed to elicit harmful or restricted content from LLMs. The dataset is structured to facilitate research into the characteristics and effectiveness of these adversarial prompts.

Quick Start & Requirements

Highlighted Details

  • Largest collection of in-the-wild jailbreak prompts to date.
  • Includes a curated question set of 390 questions across 13 forbidden scenarios.
  • Data spans from December 2022 to December 2023.
  • Responsible disclosure of findings to LLM vendors.

Maintenance & Community

The project is associated with the ACM CCS 2024 paper "Do Anything Now". Further community interaction details are not specified in the README.

Licensing & Compatibility

The jailbreak_llms repository is licensed under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The dataset contains examples of harmful language, and reader discretion is advised. The project is intended for research purposes only, and misuse is strictly prohibited. Preprocessing the prompt field to remove duplicates is recommended if using the dataset for model training.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
48 stars in the last 30 days

Explore Similar Projects

Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

PurpleLlama by meta-llama

0.6%
4k
LLM security toolkit for assessing/improving generative AI models
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.