nanoGCG by GraySwanAI

PyTorch implementation of the Greedy Coordinate Gradient (GCG) algorithm

Created 1 year ago

309 stars

Top 87.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Dan Hendrycks

Author of MMLU; Executive Director at Center for AI Safety

Project Summary

This repository provides nanoGCG, a fast and lightweight PyTorch implementation of the Greedy Coordinate Gradient (GCG) algorithm. It enables users to optimize adversarial strings for causal Hugging Face language models, offering advanced features for enhanced performance and flexibility in prompt engineering.

How It Works

nanoGCG implements the GCG algorithm, which iteratively optimizes a target string by making small, greedy changes to tokens. It supports several enhancements over the original algorithm, including multi-position token swapping, a historical attack buffer, the mellowmax loss function, and probe sampling. Probe sampling, in particular, accelerates optimization by using a smaller draft model to pre-filter candidate prompts, potentially achieving significant speedups.

Quick Start & Requirements

Install via pip: pip install nanogcg
Requires PyTorch and Hugging Face Transformers.
GPU with CUDA is highly recommended for performance.
Example usage and detailed configuration options are available in the README.

Highlighted Details

Supports advanced GCG modifications: multi-position token swapping, historical attack buffer, mellowmax loss, and probe sampling.
Probe sampling can achieve up to 2.9x speedup in testing.
Allows flexible placement of adversarial strings within conversation histories.
Configurable parameters for fine-tuning the GCG process.

Maintenance & Community

The project is associated with its authors and the GCG algorithm's foundational research. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

Licensed under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While the implementation is lightweight, running GCG attacks can be computationally intensive, requiring significant GPU resources for larger models and longer optimization steps. The effectiveness of probe sampling depends on the choice of the draft model.

Health Check

Last Commit

8 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days