Python framework for LLM adversarial jailbreak prompt generation
Top 50.2% on sourcepulse
EasyJailbreak is a Python framework for researchers and developers to generate adversarial jailbreak prompts for Large Language Models (LLMs). It decomposes the jailbreaking process into modular steps, enabling systematic experimentation and the creation of custom attack strategies.
How It Works
The framework operates in a loop: it selects initial prompts (seeds), mutates them using various techniques (e.g., rephrasing, character manipulation), applies constraints, and then attacks a target LLM. The responses are evaluated to score the attack's effectiveness, feeding back into the selection process for iterative improvement. This modular design allows users to mix and match components like selectors, mutators, constraints, and evaluators to build tailored jailbreaking recipes.
Quick Start & Requirements
pip install easyjailbreak
git clone https://github.com/EasyJailbreak/EasyJailbreak.git && cd EasyJailbreak && pip install -e .
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
4 months ago
1+ week