jailbreak_llms  by verazuo

Dataset for LLM jailbreak research (CCS'24 paper)

created 2 years ago
3,230 stars

Top 15.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a large-scale dataset of 15,140 "in-the-wild" prompts collected from various online platforms, including 1,405 identified jailbreak prompts. It is intended for researchers studying the security and robustness of Large Language Models (LLMs) against adversarial inputs. The dataset enables analysis of real-world prompt engineering techniques used to bypass LLM safety guidelines.

How It Works

The project collects and categorizes prompts from diverse sources like Reddit, Discord, and public datasets. It identifies and isolates jailbreak prompts, which are designed to elicit harmful or restricted content from LLMs. The dataset is structured to facilitate research into the characteristics and effectiveness of these adversarial prompts.

Quick Start & Requirements

Highlighted Details

  • Largest collection of in-the-wild jailbreak prompts to date.
  • Includes a curated question set of 390 questions across 13 forbidden scenarios.
  • Data spans from December 2022 to December 2023.
  • Responsible disclosure of findings to LLM vendors.

Maintenance & Community

The project is associated with the ACM CCS 2024 paper "Do Anything Now". Further community interaction details are not specified in the README.

Licensing & Compatibility

The jailbreak_llms repository is licensed under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The dataset contains examples of harmful language, and reader discretion is advised. The project is intended for research purposes only, and misuse is strictly prohibited. Preprocessing the prompt field to remove duplicates is recommended if using the dataset for model training.

Health Check
Last commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
156 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), and
2 more.

llm-security by greshake

0.1%
2k
Research paper on indirect prompt injection attacks targeting app-integrated LLMs
created 2 years ago
updated 2 weeks ago
Feedback? Help us improve.