EasyJailbreak by EasyJailbreak

Python framework for LLM adversarial jailbreak prompt generation

Created 1 year ago

803 stars

Top 44.0% on SourcePulse

Project Summary

EasyJailbreak is a Python framework for researchers and developers to generate adversarial jailbreak prompts for Large Language Models (LLMs). It decomposes the jailbreaking process into modular steps, enabling systematic experimentation and the creation of custom attack strategies.

How It Works

The framework operates in a loop: it selects initial prompts (seeds), mutates them using various techniques (e.g., rephrasing, character manipulation), applies constraints, and then attacks a target LLM. The responses are evaluated to score the attack's effectiveness, feeding back into the selection process for iterative improvement. This modular design allows users to mix and match components like selectors, mutators, constraints, and evaluators to build tailored jailbreaking recipes.

Quick Start & Requirements

Install via pip: pip install easyjailbreak
For development (adding components): git clone https://github.com/EasyJailbreak/EasyJailbreak.git && cd EasyJailbreak && pip install -e .
Requires Python >= 3.9.
Supports Hugging Face and OpenAI models.
Official documentation: https://github.com/EasyJailbreak/EasyJailbreak/blob/main/docs/README.md
Available datasets: https://huggingface.co/datasets/Lemhf14/EasyJailbreak_Datasets

Highlighted Details

Implements 12 distinct jailbreak "recipes" (attack strategies) such as GPTFuzz, AutoDAN, and GCG.
Offers a comprehensive suite of modular components: Selectors, Mutators, Constraints, and Evaluators.
Supports various LLM integrations, including Hugging Face models (e.g., Vicuna, Llama-2) and OpenAI's API (e.g., GPT-4).
Provides detailed experimental results from paper on 10 LLMs and 11 attack recipes.

Maintenance & Community

Primarily developed by Weikang Zhou and collaborators.
Paper available on arXiv: https://arxiv.org/abs/2403.12171

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing for commercial use.

Limitations & Caveats

Requires API keys for OpenAI models.
Some recipes may have specific dependencies or limitations as detailed in the documentation.
The framework is geared towards LLM security research, requiring a good understanding of LLM internals and attack methodologies.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

36 stars in the last 30 days

Explore Similar Projects

llm-adaptive-attacks by tml-epfl

Research paper on jailbreaking safety-aligned LLMs via adaptive attacks

Created 1 year ago

Updated 11 months ago

Prompt-Hacking-Resources by PromptLabs

Curated resources for AI red teaming and prompt hacking

Created 9 months ago

Updated 8 months ago

GA by General-Analysis

Jailbreak recipe book for safer AI models

Created 11 months ago

Updated 7 months ago

AutoDAN by SheltonLiu-N

Jailbreak attack research paper for aligned LLMs

Created 2 years ago

Updated 11 months ago

jailbreakbench by JailbreakBench

Robustness benchmark for jailbreaking LLMs (NeurIPS 2024)

Created 2 years ago

Updated 9 months ago

GPTFuzz by sherdencooper

Red-teaming tool for LLMs using auto-generated jailbreak prompts

Created 2 years ago

Updated 1 year ago

finite-monkey-engine by BradMoonUESTC

AI engine for smart contract audit

Created 1 year ago

Updated 5 days ago

Starred by

Elie Bursztein

Elie Bursztein(Cybersecurity Lead at Google DeepMind).

ps-fuzz by prompt-security

GenAI security tool to harden system prompts against LLM attacks

Created 2 years ago

Updated 3 months ago

bon-jailbreaking by jplhughes

Code for jailbreaking LLMs using a Best-of-N approach

Created 1 year ago

Updated 11 months ago

Awesome-Jailbreak-on-LLMs by yueliu1999

Collection of jailbreak methods on LLMs

Created 1 year ago

Updated 5 days ago

FuzzyAI by cyberark

LLM fuzzer for automated jailbreak detection

Created 1 year ago

Updated 1 month ago

Awesome_GPT_Super_Prompting by CyberAlbSecOP

Collection of resources for GPT prompt security, jailbreaks, and leaks

Created 1 year ago

Updated 2 months ago

Feedback? Help us improve.