Awesome-Interpretability-in-Large-Language-Models by ruizheliUOA

Resource list for LLM interpretability research

Created 1 year ago

389 stars

Top 73.9% on SourcePulse

Project Summary

This repository serves as a comprehensive, curated collection of resources for the rapidly growing field of interpretability in Large Language Models (LLMs). It aims to provide beginners with a structured entry point and researchers with a way to stay abreast of the latest advancements, covering libraries, tutorials, papers, and tools.

How It Works

The project functions as a living bibliography, meticulously categorizing and linking to a vast array of resources. It covers foundational libraries like TransformerLens and nnsight for mechanistic interpretability, alongside tools for analyzing Sparse Autoencoders (SAEs) and other interpretability techniques. The collection is organized into logical sections, facilitating navigation through different facets of LLM interpretability research.

Quick Start & Requirements

This repository is a curated list of resources, not a runnable software package. Users will need to explore the linked libraries and tools individually. Many linked libraries (e.g., TransformerLens, nnsight, shap) require Python environments and specific deep learning frameworks like PyTorch or JAX. Some advanced analyses may necessitate GPU acceleration and specific model checkpoints.

Highlighted Details

Extensive categorization of resources including libraries, blogs, tutorials, forums, tools, programs, and papers.
Detailed lists of papers categorized by topic (Survey, Position, Interpretable Analysis, SAEs, Vision LLMs, Benchmarking, Enhancing Interpretability).
Links to numerous libraries and tools specifically designed for LLM interpretability, such as TransformerLens, nnsight, and SAE Lens.
Includes resources for understanding foundational concepts like attention mechanisms and transformer internals.

Maintenance & Community

The repository is maintained by ruizheliUOA and welcomes contributions via issues. Contact information for the maintainer is provided. Links to relevant forums like AI Alignment Forum and LessWrong are included.

Licensing & Compatibility

The repository itself is a collection of links and does not have a specific license. The licenses of the linked libraries and tools vary, and users should consult the respective projects for compatibility and usage terms.

Limitations & Caveats

As a curated list, the repository's value is dependent on the quality and maintenance of the linked external resources. The rapid pace of LLM research means the collection may require frequent updates to remain comprehensive.

Awesome-Interpretability-in-Large-Language-Models by ruizheliUOA

Explore Similar Projects

lens by ContextualAI

Awesome-LLM-Interpretability by cooperleong00

llm_illustrated by chaoswork

LLM-for-misinformation-research by ICTMCG

Awesome_Multimodel_LLM by Atomic-man007

Awesome-LLMs-meet-Multimodal-Generation by YingqingHe

Awesome-LLM-Eval by onejune2018

Build-a-Large-Language-Model-from-Scratch by JohnMachado11

awesome-llm-interpretability by JShollaj

Transformers-for-NLP-and-Computer-Vision-3rd-Edition by Denis2054

SAELens by decoderesearch

captum by meta-pytorch