Awesome-LLM-Interpretability by cooperleong00

Curated list of LLM interpretability resources

Created 1 year ago

290 stars

Top 91.0% on SourcePulse

Project Summary

This repository is a curated list of resources for Large Language Model (LLM) interpretability, targeting researchers and engineers interested in understanding the internal workings of LLMs. It provides a comprehensive overview of tutorials, libraries, surveys, papers, and blog posts related to mechanistic interpretability and other interpretability techniques.

How It Works

The project categorizes LLM interpretability research by topic, technique, and application. It covers a wide range of methods, including probing, causal intervention, sparse autoencoders, and visualization techniques. The goal is to provide a structured and accessible entry point into this complex field, highlighting key papers and tools that advance the understanding of LLM behavior.

Quick Start & Requirements

This is a curated list, not a runnable library. Users will need to follow links to specific tools or papers for installation and execution.

Highlighted Details

Extensive coverage of mechanistic interpretability, including tutorials from prominent researchers like Neel Nanda and Callum McDougall.
A comprehensive list of libraries such as TransformerLens, CircuitsVis, and pyreft for practical interpretability work.
Numerous survey papers covering various aspects of LLM interpretability, from sparse autoencoders to causal interpretability.
Detailed categorization of research by specific LLM components (Attention, MLP/FFN, Neurons) and abilities (Reasoning, In-context Learning, Factual Knowledge).

Maintenance & Community

The list is actively curated, with recent additions reflecting ongoing research trends. Links to relevant forums like the AI Alignment Forum and Lesswrong are provided for community engagement.

Licensing & Compatibility

The repository itself is a list of links and does not have a specific license. Individual linked resources will have their own licenses.

Limitations & Caveats

As a curated list, the repository does not provide direct functionality. Users must navigate to external resources, which may have varying levels of documentation, support, and licensing. The sheer volume of information can be overwhelming for newcomers.

Awesome-LLM-Interpretability by cooperleong00

Explore Similar Projects

llm-interp-tau by mega002

Awesome-Interpretability-in-Large-Language-Models by ruizheliUOA

xplique by deel-ai

XAI-papers by anguyen8

Quantus by understandable-machine-intelligence-lab

awesome-llm-interpretability by JShollaj

awesome_deep_learning_interpretability by oneTaken

SAELens by decoderesearch

transformer-debugger by openai

TransformerLens by TransformerLensOrg

captum by meta-pytorch

awesome-multimodal-ml by pliang279