Curated list of LLM interpretability resources
Top 98.0% on sourcepulse
This repository is a curated list of resources for Large Language Model (LLM) interpretability, targeting researchers and engineers interested in understanding the internal workings of LLMs. It provides a comprehensive overview of tutorials, libraries, surveys, papers, and blog posts related to mechanistic interpretability and other interpretability techniques.
How It Works
The project categorizes LLM interpretability research by topic, technique, and application. It covers a wide range of methods, including probing, causal intervention, sparse autoencoders, and visualization techniques. The goal is to provide a structured and accessible entry point into this complex field, highlighting key papers and tools that advance the understanding of LLM behavior.
Quick Start & Requirements
This is a curated list, not a runnable library. Users will need to follow links to specific tools or papers for installation and execution.
Highlighted Details
Maintenance & Community
The list is actively curated, with recent additions reflecting ongoing research trends. Links to relevant forums like the AI Alignment Forum and Lesswrong are provided for community engagement.
Licensing & Compatibility
The repository itself is a list of links and does not have a specific license. Individual linked resources will have their own licenses.
Limitations & Caveats
As a curated list, the repository does not provide direct functionality. Users must navigate to external resources, which may have varying levels of documentation, support, and licensing. The sheer volume of information can be overwhelming for newcomers.
4 months ago
1 week