Discover and explore top open-source AI tools and projects—updated daily.
Survey of LLM attention head interpretability research
Top 76.9% on SourcePulse
This repository provides a comprehensive survey and curated list of research papers focused on the interpretability of attention heads in Large Language Models (LLMs). It aims to demystify the "black box" of LLMs by categorizing and analyzing the specific functions and mechanisms of individual attention heads, offering valuable insights for researchers and engineers working on LLM interpretability and alignment.
How It Works
The project categorizes LLM attention head research using a novel four-stage framework inspired by human cognition: Knowledge Recalling, In-Context Identification, Latent Reasoning, and Expression Preparation. It systematically reviews experimental methodologies, identifies limitations, and proposes future research directions, providing a structured approach to understanding the complex interplay of attention heads in LLM reasoning processes.
Quick Start & Requirements
This repository is a curated list of research papers and does not require installation or specific software. All listed papers are accessible via provided links, primarily to arXiv.
Highlighted Details
Maintenance & Community
The project is actively maintained by IAAR-Shanghai. Recent news highlights paper acceptance and recognition on Hugging Face's Daily Paper List. A contribution guide is provided for community involvement.
Licensing & Compatibility
The repository itself does not specify a license, but it links to research papers which have their own licensing terms, typically permissive for academic use.
Limitations & Caveats
The repository is a curated list and does not provide executable code or models. The interpretability research itself is ongoing, and the functional roles of all attention heads are not yet fully understood.
6 months ago
Inactive