Awesome-Attention-Heads  by IAAR-Shanghai

Survey of LLM attention head interpretability research

Created 1 year ago
366 stars

Top 76.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive survey and curated list of research papers focused on the interpretability of attention heads in Large Language Models (LLMs). It aims to demystify the "black box" of LLMs by categorizing and analyzing the specific functions and mechanisms of individual attention heads, offering valuable insights for researchers and engineers working on LLM interpretability and alignment.

How It Works

The project categorizes LLM attention head research using a novel four-stage framework inspired by human cognition: Knowledge Recalling, In-Context Identification, Latent Reasoning, and Expression Preparation. It systematically reviews experimental methodologies, identifies limitations, and proposes future research directions, providing a structured approach to understanding the complex interplay of attention heads in LLM reasoning processes.

Quick Start & Requirements

This repository is a curated list of research papers and does not require installation or specific software. All listed papers are accessible via provided links, primarily to arXiv.

Highlighted Details

  • Comprehensive survey paper available on arXiv (abs/2409.03752), accepted by Patterns (Cell Press).
  • Papers are organized by publication date, covering research from 2016 to early 2025.
  • Covers a wide range of attention head functions, including retrieval, reasoning, bias mitigation, and prompt injection detection.
  • Includes a contribution guide for adding new research.

Maintenance & Community

The project is actively maintained by IAAR-Shanghai. Recent news highlights paper acceptance and recognition on Hugging Face's Daily Paper List. A contribution guide is provided for community involvement.

Licensing & Compatibility

The repository itself does not specify a license, but it links to research papers which have their own licensing terms, typically permissive for academic use.

Limitations & Caveats

The repository is a curated list and does not provide executable code or models. The interpretability research itself is ongoing, and the functional roles of all attention heads are not yet fully understood.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Anastasios Angelopoulos Anastasios Angelopoulos(Cofounder of LMArena), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

transformer-debugger by openai

0.1%
4k
Tool for language model behavior investigation
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.