Awesome-Attention-Heads by IAAR-Shanghai

Survey of LLM attention head interpretability research

Created 1 year ago

392 stars

Top 73.4% on SourcePulse

Project Summary

This repository provides a comprehensive survey and curated list of research papers focused on the interpretability of attention heads in Large Language Models (LLMs). It aims to demystify the "black box" of LLMs by categorizing and analyzing the specific functions and mechanisms of individual attention heads, offering valuable insights for researchers and engineers working on LLM interpretability and alignment.

How It Works

The project categorizes LLM attention head research using a novel four-stage framework inspired by human cognition: Knowledge Recalling, In-Context Identification, Latent Reasoning, and Expression Preparation. It systematically reviews experimental methodologies, identifies limitations, and proposes future research directions, providing a structured approach to understanding the complex interplay of attention heads in LLM reasoning processes.

Quick Start & Requirements

This repository is a curated list of research papers and does not require installation or specific software. All listed papers are accessible via provided links, primarily to arXiv.

Highlighted Details

Comprehensive survey paper available on arXiv (abs/2409.03752), accepted by Patterns (Cell Press).
Papers are organized by publication date, covering research from 2016 to early 2025.
Covers a wide range of attention head functions, including retrieval, reasoning, bias mitigation, and prompt injection detection.
Includes a contribution guide for adding new research.

Maintenance & Community

The project is actively maintained by IAAR-Shanghai. Recent news highlights paper acceptance and recognition on Hugging Face's Daily Paper List. A contribution guide is provided for community involvement.

Licensing & Compatibility

The repository itself does not specify a license, but it links to research papers which have their own licensing terms, typically permissive for academic use.

Limitations & Caveats

The repository is a curated list and does not provide executable code or models. The interpretability research itself is ongoing, and the functional roles of all attention heads are not yet fully understood.

Awesome-Attention-Heads by IAAR-Shanghai

Explore Similar Projects

LatentCoT-Horizon by multimodal-art-projection

awesome-llm-understanding-mechanism by zepingyu0512

llm-interp-tau by mega002

LM-reasoning by jeffhj

MMLU-Pro by TIGER-AI-Lab

Awesome-Reasoning-Foundation-Models by reasoning-survey

tree-of-thought-prompting by dave1010

awesome-llm-interpretability by JShollaj

Chain-of-ThoughtsPapers by Timothyxxx

chain-of-thought-hub by FranxYao

Awesome-LLM-Reasoning by atfortes

transformer-debugger by openai