llm_steer  by Mihaiii

LLM output steering via activation engineering

Created 2 years ago
264 stars

Top 96.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

A Python module for steering Large Language Model (LLM) outputs towards specific topics or enhancing response capabilities using activation engineering. It allows users to inject "steering vectors" into model layers, offering a practical method to influence LLM behavior beyond traditional prompt engineering, particularly for users of HuggingFace's transformers library.

How It Works

The core mechanism involves modifying LLM activations by adding user-defined steering vectors to specific layers. Each vector is associated with a target text and a coefficient (positive or negative), directly influencing the model's internal state to guide its output. This approach aims for more precise control over LLM responses, potentially improving accuracy on complex tasks or enforcing specific stylistic traits.

Quick Start & Requirements

  • Install: pip install llm_steer
  • Prerequisites: Requires HuggingFace's transformers library. Tested on LLaMa, Mistral, Phi, and StableLM architectures. Note: Not compatible with GGUF models.
  • Demo: An interactive Google Colab notebook demonstrates usage: https://colab.research.google.com/github/Mihaiii/llm_steer/blob/main/demo/llm_steer_demo.ipynb
  • Parameter Tuning: Optimal parameters (layer index, coefficient) are determined via trial and error; starting with small coefficients is recommended.

Highlighted Details

  • Enables direct manipulation of LLM activations for targeted output steering.
  • Supports complex configurations: multiple vectors per layer, same vector across layers, and negative coefficients.
  • Potential applications include enhancing role-play characteristics or mitigating undesirable default responses.
  • Complements system prompt engineering by offering a lower-level control mechanism.

Maintenance & Community

The README does not provide details on specific maintainers, community channels (e.g., Discord, Slack), or a public roadmap.

Licensing & Compatibility

The README does not specify a software license. Compatibility is restricted to LLMs supported by HuggingFace's transformers library.

Limitations & Caveats

Experimental "advanced usage" parameters can lead to nonsensical outputs. Achieving desired results often requires significant trial and error with coefficient values and layer selections. Poorly tuned vectors may cause the LLM to output gibberish.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

LMaaS-Papers by txsun1997

0%
548
Curated list of LMaaS research papers
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.