remove-refusals-with-transformers  by Sumandora

Refusal removal via HF Transformers

Created 1 year ago
1,186 stars

Top 32.7% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a proof-of-concept implementation for removing harmful refusals from Large Language Models (LLMs) using only Hugging Face Transformers. It targets researchers and developers working on LLM safety and alignment, offering a method to bypass refusal mechanisms without relying on specialized libraries like TransformerLens.

How It Works

The approach involves modifying the model's internal states to steer it away from refusal responses. It leverages the flexibility of the Hugging Face Transformers library, allowing compatibility with any model supported by the library, provided its layer structure is accessible via model.model.layers. This direct manipulation of model internals aims to achieve refusal removal efficiently.

Quick Start & Requirements

  • Install via pip install transformers.
  • Requires Python and a compatible Hugging Face Transformers model.
  • Tested on RTX 2060 6GB, suggesting suitability for models under 3B parameters, though larger models may also work.
  • Configuration is done within compute_refusal_dir.py and inference.py.

Highlighted Details

  • Pure Hugging Face Transformers implementation, maximizing model compatibility.
  • Proof-of-concept for refusal removal without TransformerLens.
  • Tested with models up to 3B parameters, with broader compatibility claimed.

Maintenance & Community

No specific community channels, roadmap, or notable contributors are mentioned in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The implementation is a crude proof-of-concept and may not work with all models, particularly those with custom layer implementations (e.g., some Qwen variants). Compatibility with models larger than 3B parameters is not extensively tested.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
59 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
3 more.

prompt-lookup-decoding by apoorvumang

0%
572
Decoding method for faster LLM generation
Created 1 year ago
Updated 1 year ago
Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0%
1k
Transformer library for flexible model development
Created 4 years ago
Updated 9 months ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
5 more.

matmulfreellm by ridgerchu

0.0%
3k
MatMul-free language models
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.