Awesome-LLM-Interpretability  by cooperleong00

Curated list of LLM interpretability resources

Created 1 year ago
269 stars

Top 95.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository is a curated list of resources for Large Language Model (LLM) interpretability, targeting researchers and engineers interested in understanding the internal workings of LLMs. It provides a comprehensive overview of tutorials, libraries, surveys, papers, and blog posts related to mechanistic interpretability and other interpretability techniques.

How It Works

The project categorizes LLM interpretability research by topic, technique, and application. It covers a wide range of methods, including probing, causal intervention, sparse autoencoders, and visualization techniques. The goal is to provide a structured and accessible entry point into this complex field, highlighting key papers and tools that advance the understanding of LLM behavior.

Quick Start & Requirements

This is a curated list, not a runnable library. Users will need to follow links to specific tools or papers for installation and execution.

Highlighted Details

  • Extensive coverage of mechanistic interpretability, including tutorials from prominent researchers like Neel Nanda and Callum McDougall.
  • A comprehensive list of libraries such as TransformerLens, CircuitsVis, and pyreft for practical interpretability work.
  • Numerous survey papers covering various aspects of LLM interpretability, from sparse autoencoders to causal interpretability.
  • Detailed categorization of research by specific LLM components (Attention, MLP/FFN, Neurons) and abilities (Reasoning, In-context Learning, Factual Knowledge).

Maintenance & Community

The list is actively curated, with recent additions reflecting ongoing research trends. Links to relevant forums like the AI Alignment Forum and Lesswrong are provided for community engagement.

Licensing & Compatibility

The repository itself is a list of links and does not have a specific license. Individual linked resources will have their own licenses.

Limitations & Caveats

As a curated list, the repository does not provide direct functionality. Users must navigate to external resources, which may have varying levels of documentation, support, and licensing. The sheer volume of information can be overwhelming for newcomers.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Anastasios Angelopoulos Anastasios Angelopoulos(Cofounder of LMArena), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

transformer-debugger by openai

0.1%
4k
Tool for language model behavior investigation
Created 1 year ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Neel Nanda Neel Nanda(Research Scientist at Google DeepMind), and
1 more.

TransformerLens by TransformerLensOrg

1.0%
3k
Library for mechanistic interpretability research on GPT-style language models
Created 3 years ago
Updated 1 day ago
Feedback? Help us improve.