honest_llama  by likenneth

Code for research paper on inference-time intervention (ITI) for LLM truthfulness

created 2 years ago
539 stars

Top 59.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code and pre-trained models for "Inference-Time Intervention" (ITI), a technique to elicit truthful answers from Large Language Models (LLMs). It targets researchers and practitioners interested in improving LLM truthfulness and understanding internal model representations, offering a minimally invasive and data-efficient method compared to RLHF.

How It Works

ITI operates by shifting model activations during inference, specifically targeting a limited number of attention heads. This intervention is guided by directions learned from a small set of examples, aiming to align the model's output with truthful information. The approach leverages insights that LLMs might internally represent truthfulness, even when producing falsehoods.

Quick Start & Requirements

Highlighted Details

  • Improves LLaMA-2 Alpaca truthfulness from 32.5% to 65.1% on TruthfulQA.
  • Offers a balance between truthfulness and helpfulness via adjustable intervention strength.
  • Data-efficient, requiring only hundreds of examples to locate truthful directions.
  • Updated to use pyvene for broader model compatibility, with legacy baukit scripts available.

Maintenance & Community

  • Project led by Kenneth Li and Oam Patel.
  • Recent updates (Aug 2024) include replication results for LLaMA-3 and ITI-baked models on HuggingFace.
  • Code is based on user-friendly llama and utilizes baukit.

Licensing & Compatibility

  • The repository itself does not explicitly state a license. The underlying models (LLaMA, Alpaca, Vicuna) have their own licenses, which may restrict commercial use.

Limitations & Caveats

  • The ITI baked-in models may have slight differences from the original ITI method due to hardcoded activation differences.
  • Evaluating with TruthfulQA requires significant setup, including OpenAI API keys and model fine-tuning.
  • Large models like LLaMA-70B may require multi-GPU setups.
Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
21 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ying Sheng Ying Sheng(Author of SGLang), and
9 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
created 2 years ago
updated 1 year ago
Feedback? Help us improve.