nanoGCG  by GraySwanAI

PyTorch implementation of the Greedy Coordinate Gradient (GCG) algorithm

Created 1 year ago
284 stars

Top 92.1% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides nanoGCG, a fast and lightweight PyTorch implementation of the Greedy Coordinate Gradient (GCG) algorithm. It enables users to optimize adversarial strings for causal Hugging Face language models, offering advanced features for enhanced performance and flexibility in prompt engineering.

How It Works

nanoGCG implements the GCG algorithm, which iteratively optimizes a target string by making small, greedy changes to tokens. It supports several enhancements over the original algorithm, including multi-position token swapping, a historical attack buffer, the mellowmax loss function, and probe sampling. Probe sampling, in particular, accelerates optimization by using a smaller draft model to pre-filter candidate prompts, potentially achieving significant speedups.

Quick Start & Requirements

  • Install via pip: pip install nanogcg
  • Requires PyTorch and Hugging Face Transformers.
  • GPU with CUDA is highly recommended for performance.
  • Example usage and detailed configuration options are available in the README.

Highlighted Details

  • Supports advanced GCG modifications: multi-position token swapping, historical attack buffer, mellowmax loss, and probe sampling.
  • Probe sampling can achieve up to 2.9x speedup in testing.
  • Allows flexible placement of adversarial strings within conversation histories.
  • Configurable parameters for fine-tuning the GCG process.

Maintenance & Community

The project is associated with its authors and the GCG algorithm's foundational research. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

Licensed under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While the implementation is lightweight, running GCG attacks can be computationally intensive, requiring significant GPU resources for larger models and longer optimization steps. The effectiveness of probe sampling depends on the choice of the draft model.

Health Check
Last Commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
3 more.

promptbench by microsoft

0.1%
3k
LLM evaluation framework
Created 2 years ago
Updated 1 month ago
Starred by Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

textgrad by zou-group

0.7%
3k
Autograd engine for textual gradients, enabling LLM-driven optimization
Created 1 year ago
Updated 1 month ago
Starred by Lukas Biewald Lukas Biewald(Cofounder of Weights & Biases), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

DialoGPT by microsoft

0.1%
2k
Response generation model via large-scale pretraining
Created 6 years ago
Updated 2 years ago
Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

llm-attacks by llm-attacks

0.2%
4k
Attack framework for aligned LLMs, based on a research paper
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.