Code for research paper on inference-time intervention (ITI) for LLM truthfulness
Top 59.7% on sourcepulse
This repository provides code and pre-trained models for "Inference-Time Intervention" (ITI), a technique to elicit truthful answers from Large Language Models (LLMs). It targets researchers and practitioners interested in improving LLM truthfulness and understanding internal model representations, offering a minimally invasive and data-efficient method compared to RLHF.
How It Works
ITI operates by shifting model activations during inference, specifically targeting a limited number of attention heads. This intervention is guided by directions learned from a small set of examples, aiming to align the model's output with truthful information. The approach leverages insights that LLMs might internally represent truthfulness, even when producing falsehoods.
Quick Start & Requirements
conda env create -f environment.yaml
and conda activate iti
.baukit
or pyvene
.Highlighted Details
pyvene
for broader model compatibility, with legacy baukit
scripts available.Maintenance & Community
baukit
.Licensing & Compatibility
Limitations & Caveats
6 months ago
1 day