ErisForge  by Tsadoq

LLM modification library for controlled behavior alteration

Created 1 year ago
254 stars

Top 99.1% on SourcePulse

GitHubView on GitHub
Project Summary

LLMs are complex systems, and understanding their internal workings or modifying their behavior for specific research or application needs can be challenging. ErisForge addresses this by providing a straightforward Python library to directly manipulate the internal layers of Large Language Models (LLMs). This allows researchers and developers to systematically ablate or augment model responses, creating modified versions for controlled experimentation and analysis, particularly useful for studying model safety and behavior.

How It Works

ErisForge enables targeted modifications to LLMs by applying transformations to their internal decoder layers. It offers specialized classes like AblationDecoderLayer and AdditionDecoderLayer to systematically remove or enhance specific functionalities within the model's architecture. The library supports the definition of custom "behavior directions" for precise control over the nature of these alterations. Additionally, it includes an ExpressionRefusalScorer to quantitatively assess the presence of refusal phrases in model outputs, aiding in the analysis of safety-related behaviors.

Quick Start & Requirements

  • Installation: Clone the repository (git clone https://github.com/tsadoq/erisforge.git), navigate to the directory (cd erisforge), and install dependencies (pip install -r requirements.txt). Alternatively, install directly via pip: pip install erisforge.
  • Prerequisites: Requires Python, torch, and the transformers library. Usage involves loading models and tokenizers, typically from the Hugging Face Hub.
  • Links: Example usage snippets are provided in the README, with a reference to a notebook for a more comprehensive demonstration of model layer transformation.

Highlighted Details

  • Directly modifies internal layers of LLMs for altered response behaviors.
  • Features AblationDecoderLayer and AdditionDecoderLayer for systematic modification.
  • Includes ExpressionRefusalScorer to measure model refusal expressions.
  • Supports custom behavior directions for fine-grained control.
  • Transformed models can be saved locally or pushed to the HuggingFace Hub.

Maintenance & Community

The provided README does not detail specific contributors, sponsorships, community channels (e.g., Discord, Slack), or a public roadmap. Contributions are encouraged through standard open-source practices like forking and submitting pull requests.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: The MIT license generally permits broad use, including commercial applications and integration into closed-source projects, though users should consult the full license text. The library is built upon and compatible with the Hugging Face Transformers ecosystem.

Limitations & Caveats

This library is explicitly provided for research and development purposes only. The author assumes no responsibility for any specific applications or uses of ErisForge. Its functionality is dependent on the underlying models and architecture supported by the Hugging Face Transformers library.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.1%
1k
Transformer library for flexible model development
Created 4 years ago
Updated 1 year ago
Feedback? Help us improve.