awesome-llm-interpretability by JShollaj

LLM interpretability resources

Created 2 years ago

1,460 stars

Top 27.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Elvis Saravia

Founder of DAIR.AI

Project Summary

This repository is a curated list of resources for Large Language Model (LLM) interpretability, targeting researchers and engineers seeking to understand and debug LLM behavior. It provides a comprehensive overview of tools, academic papers, articles, and community groups dedicated to this field, aiming to foster transparency and explainability in AI.

How It Works

The collection is organized into categories: Tools, Papers, Articles, and Groups. Tools include platforms like The Learning Interpretability Tool, Pythia, and TransformerLens, offering functionalities for visualization, attribution, and mechanistic analysis. Papers cover theoretical advancements and empirical studies on topics such as neuron behavior, attention mechanisms, and bias detection. Articles provide insightful discussions and tutorials, while Groups highlight key research communities like PAIR, Alignment Lab AI, Nous Research, and EleutherAI.

Quick Start & Requirements

This is a curated list, not a software package. No installation or execution is required. Resources linked within may have their own requirements.

Highlighted Details

Extensive catalog of tools for analyzing LLM internals, including visualization, attribution, and mechanistic interpretability.
Comprehensive list of academic papers covering diverse interpretability techniques and findings.
Links to insightful articles and blog posts offering practical insights and discussions.
Directory of key research groups and communities actively contributing to LLM interpretability.

Maintenance & Community

The list is actively curated, with contributions encouraged via pull requests. Links to contributing guidelines and a code of conduct are provided. Community engagement is facilitated through various listed research groups and their associated platforms.

Licensing & Compatibility

The repository itself is licensed under an unspecified license. Individual resources linked within will have their own licenses, which may vary. Users should verify licensing for any tools or papers they intend to use, especially for commercial applications.

Limitations & Caveats

As a curated list, the repository's value is dependent on the quality and relevance of the linked resources. The rapidly evolving nature of LLM interpretability means the list may require frequent updates to remain comprehensive.

Health Check

Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days