Discover and explore top open-source AI tools and projects—updated daily.
TsadoqLLM modification library for controlled behavior alteration
Top 99.1% on SourcePulse
LLMs are complex systems, and understanding their internal workings or modifying their behavior for specific research or application needs can be challenging. ErisForge addresses this by providing a straightforward Python library to directly manipulate the internal layers of Large Language Models (LLMs). This allows researchers and developers to systematically ablate or augment model responses, creating modified versions for controlled experimentation and analysis, particularly useful for studying model safety and behavior.
How It Works
ErisForge enables targeted modifications to LLMs by applying transformations to their internal decoder layers. It offers specialized classes like AblationDecoderLayer and AdditionDecoderLayer to systematically remove or enhance specific functionalities within the model's architecture. The library supports the definition of custom "behavior directions" for precise control over the nature of these alterations. Additionally, it includes an ExpressionRefusalScorer to quantitatively assess the presence of refusal phrases in model outputs, aiding in the analysis of safety-related behaviors.
Quick Start & Requirements
git clone https://github.com/tsadoq/erisforge.git), navigate to the directory (cd erisforge), and install dependencies (pip install -r requirements.txt). Alternatively, install directly via pip: pip install erisforge.torch, and the transformers library. Usage involves loading models and tokenizers, typically from the Hugging Face Hub.Highlighted Details
AblationDecoderLayer and AdditionDecoderLayer for systematic modification.ExpressionRefusalScorer to measure model refusal expressions.Maintenance & Community
The provided README does not detail specific contributors, sponsorships, community channels (e.g., Discord, Slack), or a public roadmap. Contributions are encouraged through standard open-source practices like forking and submitting pull requests.
Licensing & Compatibility
Limitations & Caveats
This library is explicitly provided for research and development purposes only. The author assumes no responsibility for any specific applications or uses of ErisForge. Its functionality is dependent on the underlying models and architecture supported by the Hugging Face Transformers library.
1 week ago
Inactive
huggingface
ndif-team
THUDM