honest_llama  by likenneth

Code for research paper on inference-time intervention (ITI) for LLM truthfulness

Created 2 years ago
548 stars

Top 58.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides code and pre-trained models for "Inference-Time Intervention" (ITI), a technique to elicit truthful answers from Large Language Models (LLMs). It targets researchers and practitioners interested in improving LLM truthfulness and understanding internal model representations, offering a minimally invasive and data-efficient method compared to RLHF.

How It Works

ITI operates by shifting model activations during inference, specifically targeting a limited number of attention heads. This intervention is guided by directions learned from a small set of examples, aiming to align the model's output with truthful information. The approach leverages insights that LLMs might internally represent truthfulness, even when producing falsehoods.

Quick Start & Requirements

Highlighted Details

  • Improves LLaMA-2 Alpaca truthfulness from 32.5% to 65.1% on TruthfulQA.
  • Offers a balance between truthfulness and helpfulness via adjustable intervention strength.
  • Data-efficient, requiring only hundreds of examples to locate truthful directions.
  • Updated to use pyvene for broader model compatibility, with legacy baukit scripts available.

Maintenance & Community

  • Project led by Kenneth Li and Oam Patel.
  • Recent updates (Aug 2024) include replication results for LLaMA-3 and ITI-baked models on HuggingFace.
  • Code is based on user-friendly llama and utilizes baukit.

Licensing & Compatibility

  • The repository itself does not explicitly state a license. The underlying models (LLaMA, Alpaca, Vicuna) have their own licenses, which may restrict commercial use.

Limitations & Caveats

  • The ITI baked-in models may have slight differences from the original ITI method due to hardcoded activation differences.
  • Evaluating with TruthfulQA requires significant setup, including OpenAI API keys and model fine-tuning.
  • Large models like LLaMA-70B may require multi-GPU setups.
Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.