GA  by General-Analysis

Jailbreak recipe book for safer AI models

Created 1 year ago
591 stars

Top 54.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a framework for stress-testing AI models by executing a curated collection of advanced jailbreaking techniques. It's designed for AI safety researchers, security engineers, and developers focused on identifying and mitigating vulnerabilities in large language models. The primary benefit is a streamlined, single-line execution of complex attack strategies.

How It Works

The framework integrates multiple state-of-the-art jailbreaking methods, including TAP, GCG, Crescendo, and AutoDAN variants. It allows users to execute these techniques against AI models using either pre-defined datasets like HarmBench or custom lists of harmful prompts. The system manages API key configurations and saves detailed logs and success metrics for analysis.

Quick Start & Requirements

  • Install: pip install -e . after cloning the repository.
  • Prerequisites: API keys for OpenAI, Anthropic, or Together.
  • Documentation: docs.generalanalysis.com
  • Demo Notebooks: Available for Tree of Attacks (TAP) and AutoDAN-Turbo.

Highlighted Details

  • Supports multiple advanced jailbreak methods: TAP, GCG, Crescendo, AutoDAN, AutoDAN-Turbo.
  • Enables execution with custom prompts or standard datasets like HarmBench.
  • Saves detailed results, logs, and success metrics for each run.
  • Claims high effectiveness against models like GPT 4o and Sonnet 3.7.

Maintenance & Community

Contributions are welcomed via pull requests or issues. Contact: info@generalanalysis.com.

Licensing & Compatibility

Dual licensed: GPLv3 for open-source use (requiring modifications to remain open-source) and a separate paid commercial license for closed-source or proprietary applications.

Limitations & Caveats

The project is intended for research purposes only and requires responsible use. The GPLv3 license may impose copyleft restrictions on derivative works.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.