Awesome-Interpretability-in-Large-Language-Models  by ruizheliUOA

Resource list for LLM interpretability research

created 1 year ago
366 stars

Top 78.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive, curated collection of resources for the rapidly growing field of interpretability in Large Language Models (LLMs). It aims to provide beginners with a structured entry point and researchers with a way to stay abreast of the latest advancements, covering libraries, tutorials, papers, and tools.

How It Works

The project functions as a living bibliography, meticulously categorizing and linking to a vast array of resources. It covers foundational libraries like TransformerLens and nnsight for mechanistic interpretability, alongside tools for analyzing Sparse Autoencoders (SAEs) and other interpretability techniques. The collection is organized into logical sections, facilitating navigation through different facets of LLM interpretability research.

Quick Start & Requirements

This repository is a curated list of resources, not a runnable software package. Users will need to explore the linked libraries and tools individually. Many linked libraries (e.g., TransformerLens, nnsight, shap) require Python environments and specific deep learning frameworks like PyTorch or JAX. Some advanced analyses may necessitate GPU acceleration and specific model checkpoints.

Highlighted Details

  • Extensive categorization of resources including libraries, blogs, tutorials, forums, tools, programs, and papers.
  • Detailed lists of papers categorized by topic (Survey, Position, Interpretable Analysis, SAEs, Vision LLMs, Benchmarking, Enhancing Interpretability).
  • Links to numerous libraries and tools specifically designed for LLM interpretability, such as TransformerLens, nnsight, and SAE Lens.
  • Includes resources for understanding foundational concepts like attention mechanisms and transformer internals.

Maintenance & Community

The repository is maintained by ruizheliUOA and welcomes contributions via issues. Contact information for the maintainer is provided. Links to relevant forums like AI Alignment Forum and LessWrong are included.

Licensing & Compatibility

The repository itself is a collection of links and does not have a specific license. The licenses of the linked libraries and tools vary, and users should consult the respective projects for compatibility and usage terms.

Limitations & Caveats

As a curated list, the repository's value is dependent on the quality and maintenance of the linked external resources. The rapid pace of LLM research means the collection may require frequent updates to remain comprehensive.

Health Check
Last commit

9 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
24 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.