awesome-llm-understanding-mechanism  by zepingyu0512

Curated list of papers for LLM interpretability

created 1 year ago
527 stars

Top 60.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository curates seminal research papers focused on understanding the internal mechanisms of Large Language Models (LLMs). It targets researchers and practitioners in AI interpretability, providing a structured overview of key findings and methodologies for dissecting LLM behavior. The benefit is a centralized, high-quality resource for advancing LLM transparency and trustworthiness.

How It Works

The repository functions as a curated bibliography, categorizing papers by their core research themes such as neuron-level analysis, circuit discovery, knowledge editing, and in-context learning. Papers are selected based on acceptance at top-tier AI conferences (ICML, NeurIPS, ICLR, ACL, EMNLP, NAACL) or affiliation with leading research institutions, ensuring a focus on rigorous and impactful contributions to LLM interpretability.

Quick Start & Requirements

This is a curated list of research papers and does not involve code execution. No installation or specific requirements are needed beyond a web browser to access the linked papers.

Highlighted Details

  • Comprehensive coverage of LLM interpretability topics including mechanistic interpretability, sparse autoencoders (SAE), logit lenses, and causal analysis.
  • Papers are tagged with relevant keywords (e.g., neuron, circuit, knowledge, fine-tune, hallucination) for easy filtering.
  • Includes links to influential blogs and other related repositories for further exploration.
  • Features papers from major research labs like OpenAI, Deepmind, and Anthropic.

Maintenance & Community

The list is maintained by zepingyu0512 and appears to be actively updated with recent publications. Further community engagement or discussion channels are not explicitly mentioned.

Licensing & Compatibility

The repository itself is licensed under an unspecified license. The linked papers are subject to their respective publisher's copyright and licensing terms.

Limitations & Caveats

This resource is a bibliography and does not provide code implementations or direct tools for LLM interpretability. The focus is solely on academic literature.

Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
79 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.