automated-interpretability  by openai

Code and datasets for automated interpretability research

Created 2 years ago
1,037 stars

Top 36.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides code and tools for the automated interpretation of neurons within language models, specifically targeting researchers and practitioners interested in understanding model behavior. It enables the generation, simulation, and scoring of explanations for individual neuron activations, facilitating deeper insights into how large language models function.

How It Works

The project implements a methodology for explaining neuron behavior by simulating neuron activations across various input tokens and scoring these simulations. It leverages pre-computed datasets of GPT-2 XL and GPT-2 Small neuron activations and explanations, accessible via Azure Blob Storage. The core approach involves analyzing connection weights and activation patterns to identify influential tokens and related neurons, providing a structured way to probe and understand the function of specific model components.

Quick Start & Requirements

  • Install: The README does not provide a direct installation command but points to separate READMEs for neuron-explainer and neuron-viewer.
  • Prerequisites: Access to Azure Blob Storage credentials may be required. The specific datasets are hosted on Azure.
  • Resources: Requires access to potentially large datasets of neuron activations and explanations.

Highlighted Details

  • Provides public datasets for GPT-2 XL and GPT-2 Small neuron activations and explanations.
  • Includes tools for viewing neuron activations and explanations (neuron-viewer).
  • Defines methodologies for calculating neuron-neuron and neuron-token connection strengths.
  • Offers lists of "interesting" neurons identified by various criteria.

Maintenance & Community

The project is associated with OpenAI. Specific community channels or active maintenance signals are not detailed in the provided README.

Licensing & Compatibility

The licensing information is not explicitly stated in the provided README.

Limitations & Caveats

A bug in the GELU implementation used for GPT-2 series inference was discovered, leading to minor discrepancies in post-MLP activation values compared to the original implementation. The methodology for GPT-2 Small differs from GPT-2 XL, making direct comparison of results challenging.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Anastasios Angelopoulos Anastasios Angelopoulos(Cofounder of LMArena), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

transformer-debugger by openai

0.1%
4k
Tool for language model behavior investigation
Created 1 year ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Neel Nanda Neel Nanda(Research Scientist at Google DeepMind), and
1 more.

TransformerLens by TransformerLensOrg

1.0%
3k
Library for mechanistic interpretability research on GPT-style language models
Created 3 years ago
Updated 1 day ago
Feedback? Help us improve.