GA by General-Analysis

Jailbreak recipe book for safer AI models

Created 11 months ago

546 stars

Top 58.4% on SourcePulse

Project Summary

This repository provides a framework for stress-testing AI models by executing a curated collection of advanced jailbreaking techniques. It's designed for AI safety researchers, security engineers, and developers focused on identifying and mitigating vulnerabilities in large language models. The primary benefit is a streamlined, single-line execution of complex attack strategies.

How It Works

The framework integrates multiple state-of-the-art jailbreaking methods, including TAP, GCG, Crescendo, and AutoDAN variants. It allows users to execute these techniques against AI models using either pre-defined datasets like HarmBench or custom lists of harmful prompts. The system manages API key configurations and saves detailed logs and success metrics for analysis.

Quick Start & Requirements

Install: pip install -e . after cloning the repository.
Prerequisites: API keys for OpenAI, Anthropic, or Together.
Documentation: docs.generalanalysis.com
Demo Notebooks: Available for Tree of Attacks (TAP) and AutoDAN-Turbo.

Highlighted Details

Supports multiple advanced jailbreak methods: TAP, GCG, Crescendo, AutoDAN, AutoDAN-Turbo.
Enables execution with custom prompts or standard datasets like HarmBench.
Saves detailed results, logs, and success metrics for each run.
Claims high effectiveness against models like GPT 4o and Sonnet 3.7.

Maintenance & Community

Contributions are welcomed via pull requests or issues. Contact: info@generalanalysis.com.

Licensing & Compatibility

Dual licensed: GPLv3 for open-source use (requiring modifications to remain open-source) and a separate paid commercial license for closed-source or proprietary applications.

Limitations & Caveats

The project is intended for research purposes only and requires responsible use. The GPLv3 license may impose copyleft restrictions on derivative works.

Health Check

Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

24 stars in the last 30 days