Resource list for LLM interpretability research
Top 78.1% on sourcepulse
This repository serves as a comprehensive, curated collection of resources for the rapidly growing field of interpretability in Large Language Models (LLMs). It aims to provide beginners with a structured entry point and researchers with a way to stay abreast of the latest advancements, covering libraries, tutorials, papers, and tools.
How It Works
The project functions as a living bibliography, meticulously categorizing and linking to a vast array of resources. It covers foundational libraries like TransformerLens and nnsight for mechanistic interpretability, alongside tools for analyzing Sparse Autoencoders (SAEs) and other interpretability techniques. The collection is organized into logical sections, facilitating navigation through different facets of LLM interpretability research.
Quick Start & Requirements
This repository is a curated list of resources, not a runnable software package. Users will need to explore the linked libraries and tools individually. Many linked libraries (e.g., TransformerLens, nnsight, shap) require Python environments and specific deep learning frameworks like PyTorch or JAX. Some advanced analyses may necessitate GPU acceleration and specific model checkpoints.
Highlighted Details
Maintenance & Community
The repository is maintained by ruizheliUOA and welcomes contributions via issues. Contact information for the maintainer is provided. Links to relevant forums like AI Alignment Forum and LessWrong are included.
Licensing & Compatibility
The repository itself is a collection of links and does not have a specific license. The licenses of the linked libraries and tools vary, and users should consult the respective projects for compatibility and usage terms.
Limitations & Caveats
As a curated list, the repository's value is dependent on the quality and maintenance of the linked external resources. The rapid pace of LLM research means the collection may require frequent updates to remain comprehensive.
9 months ago
1 day