Jailbreak recipe book for safer AI models
Top 61.7% on sourcepulse
This repository provides a framework for stress-testing AI models by executing a curated collection of advanced jailbreaking techniques. It's designed for AI safety researchers, security engineers, and developers focused on identifying and mitigating vulnerabilities in large language models. The primary benefit is a streamlined, single-line execution of complex attack strategies.
How It Works
The framework integrates multiple state-of-the-art jailbreaking methods, including TAP, GCG, Crescendo, and AutoDAN variants. It allows users to execute these techniques against AI models using either pre-defined datasets like HarmBench or custom lists of harmful prompts. The system manages API key configurations and saves detailed logs and success metrics for analysis.
Quick Start & Requirements
pip install -e .
after cloning the repository.Highlighted Details
Maintenance & Community
Contributions are welcomed via pull requests or issues. Contact: info@generalanalysis.com.
Licensing & Compatibility
Dual licensed: GPLv3 for open-source use (requiring modifications to remain open-source) and a separate paid commercial license for closed-source or proprietary applications.
Limitations & Caveats
The project is intended for research purposes only and requires responsible use. The GPLv3 license may impose copyleft restrictions on derivative works.
2 months ago
Inactive