llm_steer by Mihaiii

LLM output steering via activation engineering

Created 2 years ago

270 stars

Top 95.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Teknium

Cofounder of Nous Research

Project Summary

A Python module for steering Large Language Model (LLM) outputs towards specific topics or enhancing response capabilities using activation engineering. It allows users to inject "steering vectors" into model layers, offering a practical method to influence LLM behavior beyond traditional prompt engineering, particularly for users of HuggingFace's transformers library.

How It Works

The core mechanism involves modifying LLM activations by adding user-defined steering vectors to specific layers. Each vector is associated with a target text and a coefficient (positive or negative), directly influencing the model's internal state to guide its output. This approach aims for more precise control over LLM responses, potentially improving accuracy on complex tasks or enforcing specific stylistic traits.

Quick Start & Requirements

Install: pip install llm_steer
Prerequisites: Requires HuggingFace's transformers library. Tested on LLaMa, Mistral, Phi, and StableLM architectures. Note: Not compatible with GGUF models.
Demo: An interactive Google Colab notebook demonstrates usage: https://colab.research.google.com/github/Mihaiii/llm_steer/blob/main/demo/llm_steer_demo.ipynb
Parameter Tuning: Optimal parameters (layer index, coefficient) are determined via trial and error; starting with small coefficients is recommended.

Highlighted Details

Enables direct manipulation of LLM activations for targeted output steering.
Supports complex configurations: multiple vectors per layer, same vector across layers, and negative coefficients.
Potential applications include enhancing role-play characteristics or mitigating undesirable default responses.
Complements system prompt engineering by offering a lower-level control mechanism.

Maintenance & Community

The README does not provide details on specific maintainers, community channels (e.g., Discord, Slack), or a public roadmap.

Licensing & Compatibility

The README does not specify a software license. Compatibility is restricted to LLMs supported by HuggingFace's transformers library.

Limitations & Caveats

Experimental "advanced usage" parameters can lead to nonsensical outputs. Achieving desired results often requires significant trial and error with coefficient values and layer selections. Poorly tuned vectors may cause the LLM to output gibberish.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days