GA  by General-Analysis

Jailbreak recipe book for safer AI models

created 6 months ago
514 stars

Top 61.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a framework for stress-testing AI models by executing a curated collection of advanced jailbreaking techniques. It's designed for AI safety researchers, security engineers, and developers focused on identifying and mitigating vulnerabilities in large language models. The primary benefit is a streamlined, single-line execution of complex attack strategies.

How It Works

The framework integrates multiple state-of-the-art jailbreaking methods, including TAP, GCG, Crescendo, and AutoDAN variants. It allows users to execute these techniques against AI models using either pre-defined datasets like HarmBench or custom lists of harmful prompts. The system manages API key configurations and saves detailed logs and success metrics for analysis.

Quick Start & Requirements

  • Install: pip install -e . after cloning the repository.
  • Prerequisites: API keys for OpenAI, Anthropic, or Together.
  • Documentation: docs.generalanalysis.com
  • Demo Notebooks: Available for Tree of Attacks (TAP) and AutoDAN-Turbo.

Highlighted Details

  • Supports multiple advanced jailbreak methods: TAP, GCG, Crescendo, AutoDAN, AutoDAN-Turbo.
  • Enables execution with custom prompts or standard datasets like HarmBench.
  • Saves detailed results, logs, and success metrics for each run.
  • Claims high effectiveness against models like GPT 4o and Sonnet 3.7.

Maintenance & Community

Contributions are welcomed via pull requests or issues. Contact: info@generalanalysis.com.

Licensing & Compatibility

Dual licensed: GPLv3 for open-source use (requiring modifications to remain open-source) and a separate paid commercial license for closed-source or proprietary applications.

Limitations & Caveats

The project is intended for research purposes only and requires responsible use. The GPLv3 license may impose copyleft restrictions on derivative works.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
153 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.