Curated list of papers for LLM interpretability
Top 60.8% on sourcepulse
This repository curates seminal research papers focused on understanding the internal mechanisms of Large Language Models (LLMs). It targets researchers and practitioners in AI interpretability, providing a structured overview of key findings and methodologies for dissecting LLM behavior. The benefit is a centralized, high-quality resource for advancing LLM transparency and trustworthiness.
How It Works
The repository functions as a curated bibliography, categorizing papers by their core research themes such as neuron-level analysis, circuit discovery, knowledge editing, and in-context learning. Papers are selected based on acceptance at top-tier AI conferences (ICML, NeurIPS, ICLR, ACL, EMNLP, NAACL) or affiliation with leading research institutions, ensuring a focus on rigorous and impactful contributions to LLM interpretability.
Quick Start & Requirements
This is a curated list of research papers and does not involve code execution. No installation or specific requirements are needed beyond a web browser to access the linked papers.
Highlighted Details
neuron
, circuit
, knowledge
, fine-tune
, hallucination
) for easy filtering.Maintenance & Community
The list is maintained by zepingyu0512 and appears to be actively updated with recent publications. Further community engagement or discussion channels are not explicitly mentioned.
Licensing & Compatibility
The repository itself is licensed under an unspecified license. The linked papers are subject to their respective publisher's copyright and licensing terms.
Limitations & Caveats
This resource is a bibliography and does not provide code implementations or direct tools for LLM interpretability. The focus is solely on academic literature.
1 week ago
Inactive