Awesome-LLM-Interpretability  by cooperleong00

Curated list of LLM interpretability resources

created 1 year ago
261 stars

Top 98.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository is a curated list of resources for Large Language Model (LLM) interpretability, targeting researchers and engineers interested in understanding the internal workings of LLMs. It provides a comprehensive overview of tutorials, libraries, surveys, papers, and blog posts related to mechanistic interpretability and other interpretability techniques.

How It Works

The project categorizes LLM interpretability research by topic, technique, and application. It covers a wide range of methods, including probing, causal intervention, sparse autoencoders, and visualization techniques. The goal is to provide a structured and accessible entry point into this complex field, highlighting key papers and tools that advance the understanding of LLM behavior.

Quick Start & Requirements

This is a curated list, not a runnable library. Users will need to follow links to specific tools or papers for installation and execution.

Highlighted Details

  • Extensive coverage of mechanistic interpretability, including tutorials from prominent researchers like Neel Nanda and Callum McDougall.
  • A comprehensive list of libraries such as TransformerLens, CircuitsVis, and pyreft for practical interpretability work.
  • Numerous survey papers covering various aspects of LLM interpretability, from sparse autoencoders to causal interpretability.
  • Detailed categorization of research by specific LLM components (Attention, MLP/FFN, Neurons) and abilities (Reasoning, In-context Learning, Factual Knowledge).

Maintenance & Community

The list is actively curated, with recent additions reflecting ongoing research trends. Links to relevant forums like the AI Alignment Forum and Lesswrong are provided for community engagement.

Licensing & Compatibility

The repository itself is a list of links and does not have a specific license. Individual linked resources will have their own licenses.

Limitations & Caveats

As a curated list, the repository does not provide direct functionality. Users must navigate to external resources, which may have varying levels of documentation, support, and licensing. The sheer volume of information can be overwhelming for newcomers.

Health Check
Last commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
29 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.