repeng  by vgel

Python library for representation engineering control vectors

created 1 year ago
618 stars

Top 54.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

repeng is a Python library designed for representation engineering, enabling users to train and apply control vectors to large language models (LLMs) to steer their behavior. It targets researchers and developers looking to modify LLM outputs with minimal computational cost, offering a fast method to imbue models with specific stylistic or behavioral traits.

How It Works

repeng implements a method for training "control vectors" that can be applied to LLM weights. The library wraps Hugging Face transformers models, allowing for the injection of these learned vectors. Training involves creating datasets of paired, contrasting statements and using these to optimize a low-rank vector that, when added to specific layers' activations, modifies the model's output according to the desired persona or style. This approach is advantageous for its speed, with training reportedly taking under a minute.

Quick Start & Requirements

  • Install via pip: pip install repeng
  • Requires PyTorch and Hugging Face transformers.
  • Example notebooks may require accelerate: %pip install accelerate
  • Supports quantized models via export to GGUF for use with llama.cpp.
  • Official documentation and examples are available in the notebooks folder and a linked blog post.

Highlighted Details

  • Enables training of control vectors in under a minute.
  • Supports exporting trained vectors to GGUF format for use with llama.cpp.
  • Demonstrates steering model output towards specific personas (e.g., psychedelic vs. sober).

Maintenance & Community

The project is maintained by Theia Vogel. A CHANGELOG is available for version history.

Licensing & Compatibility

The code derives from andyzoujm/representation-engineering (MIT license). The project itself does not explicitly state a license, but the MIT license of its source material suggests broad compatibility.

Limitations & Caveats

Vector training does not currently work with Mixture-of-Experts (MoE) models like Mixtral. The library is in active development, and some example notebooks require manual installation of dependencies.

Health Check
Last commit

6 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
1
Star History
39 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.