honest_llama by likenneth

Code for research paper on inference-time intervention (ITI) for LLM truthfulness

Created 2 years ago

564 stars

Top 56.9% on SourcePulse

1 Expert Loves This Project

chiphuyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Project Summary

This repository provides code and pre-trained models for "Inference-Time Intervention" (ITI), a technique to elicit truthful answers from Large Language Models (LLMs). It targets researchers and practitioners interested in improving LLM truthfulness and understanding internal model representations, offering a minimally invasive and data-efficient method compared to RLHF.

How It Works

ITI operates by shifting model activations during inference, specifically targeting a limited number of attention heads. This intervention is guided by directions learned from a small set of examples, aiming to align the model's output with truthful information. The approach leverages insights that LLMs might internally represent truthfulness, even when producing falsehoods.

Quick Start & Requirements

Install via conda env create -f environment.yaml and conda activate iti.
Requires Python 3.x, PyTorch, Hugging Face Transformers, and baukit or pyvene.
TruthfulQA evaluation requires an OpenAI API key and fine-tuning GPT models for judging.
Official Colab notebook: https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/pyvene_101.ipynb

Highlighted Details

Improves LLaMA-2 Alpaca truthfulness from 32.5% to 65.1% on TruthfulQA.
Offers a balance between truthfulness and helpfulness via adjustable intervention strength.
Data-efficient, requiring only hundreds of examples to locate truthful directions.
Updated to use pyvene for broader model compatibility, with legacy baukit scripts available.

Maintenance & Community

Project led by Kenneth Li and Oam Patel.
Recent updates (Aug 2024) include replication results for LLaMA-3 and ITI-baked models on HuggingFace.
Code is based on user-friendly llama and utilizes baukit.

Licensing & Compatibility

The repository itself does not explicitly state a license. The underlying models (LLaMA, Alpaca, Vicuna) have their own licenses, which may restrict commercial use.

Limitations & Caveats

The ITI baked-in models may have slight differences from the original ITI method due to hardcoded activation differences.
Evaluating with TruthfulQA requires significant setup, including OpenAI API keys and model fine-tuning.
Large models like LLaMA-70B may require multi-GPU setups.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

2 stars in the last 30 days

Explore Similar Projects

Vision-R1 by Osilly

Research paper exploring RL for multimodal LLMs

Created 11 months ago

Updated 4 months ago

Starred by

Nathan Lambert

Nathan Lambert(Research Scientist at AI2).

RLAIF-V by RLHF-V

Framework for aligning MLLMs using open-source AI feedback

Created 1 year ago

Updated 8 months ago

VisualThinker-R1-Zero by turningpoint-ai

Research paper replicating visual reasoning "aha moment" on a 2B model

Created 10 months ago

Updated 9 months ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

OpenICL by Shark-NLP

Open-source framework for in-context learning research

Created 2 years ago

Updated 2 years ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI).

ReasonFlux by Gen-Verse

LLM post-training algorithms for data selection, RL, and inference

Created 11 months ago

Updated 3 months ago

Starred by

Binyuan Hui

Binyuan Hui(Research Scientist at Alibaba Qwen) and

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen).

LM-reasoning by jeffhj

Collection of papers on reasoning in large language models

Created 3 years ago

Updated 2 years ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

Tina by shangshang-wang

LoRA reasoning models

Created 9 months ago

Updated 3 months ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect) and

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

discovering_latent_knowledge by collin-burns

Research paper code for unsupervised discovery of latent knowledge in LLMs

Created 3 years ago

Updated 1 year ago

Starred by

Gabriel Almeida

Gabriel Almeida(Cofounder of Langflow).

anli by facebookresearch

NLI benchmark dataset for natural language understanding research

Created 6 years ago

Updated 3 years ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Stella Rose Biderman

Stella Rose Biderman(Executive Director at EleutherAI), and

2 more.

nnsight by ndif-team

SDK for interpreting/manipulating deep model internals

Created 2 years ago

Updated 1 day ago

RAT-retrieval-augmented-thinking by Doriandarko

AI tool enhancing responses via structured reasoning and retrieval

Created 11 months ago

Updated 11 months ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

2 more.

TruthfulQA by sylinrl

Benchmark dataset for evaluating truthfulness of language models

Created 4 years ago

Updated 1 year ago

Feedback? Help us improve.