awesome-llm-understanding-mechanism by zepingyu0512

Curated list of papers for LLM interpretability

Created 1 year ago

603 stars

Top 54.3% on SourcePulse

Project Summary

This repository curates seminal research papers focused on understanding the internal mechanisms of Large Language Models (LLMs). It targets researchers and practitioners in AI interpretability, providing a structured overview of key findings and methodologies for dissecting LLM behavior. The benefit is a centralized, high-quality resource for advancing LLM transparency and trustworthiness.

How It Works

The repository functions as a curated bibliography, categorizing papers by their core research themes such as neuron-level analysis, circuit discovery, knowledge editing, and in-context learning. Papers are selected based on acceptance at top-tier AI conferences (ICML, NeurIPS, ICLR, ACL, EMNLP, NAACL) or affiliation with leading research institutions, ensuring a focus on rigorous and impactful contributions to LLM interpretability.

Quick Start & Requirements

This is a curated list of research papers and does not involve code execution. No installation or specific requirements are needed beyond a web browser to access the linked papers.

Highlighted Details

Comprehensive coverage of LLM interpretability topics including mechanistic interpretability, sparse autoencoders (SAE), logit lenses, and causal analysis.
Papers are tagged with relevant keywords (e.g., neuron, circuit, knowledge, fine-tune, hallucination) for easy filtering.
Includes links to influential blogs and other related repositories for further exploration.
Features papers from major research labs like OpenAI, Deepmind, and Anthropic.

Maintenance & Community

The list is maintained by zepingyu0512 and appears to be actively updated with recent publications. Further community engagement or discussion channels are not explicitly mentioned.

Licensing & Compatibility

The repository itself is licensed under an unspecified license. The linked papers are subject to their respective publisher's copyright and licensing terms.

Limitations & Caveats

This resource is a bibliography and does not provide code implementations or direct tools for LLM interpretability. The focus is solely on academic literature.

awesome-llm-understanding-mechanism by zepingyu0512

Explore Similar Projects

llm-interp-tau by mega002

awesome-huge-models by zhengzangw

awesome-agi-cocosci by SHI-Yu-Zhe

LLMPapers by SEU-COIN

AGI-Papers by gyunggyung

awesome-neuro-ai-papers by CYHSM

Awesome-Foundation-Models by uncbiag

ML-AI-Research-Papers---Solved by Coder-World04

awesome-llm-interpretability by JShollaj

awesome-ai-papers by aimerou

awesome_deep_learning_interpretability by oneTaken

TransformerLens by TransformerLensOrg