automated-interpretability by openai

Code and datasets for automated interpretability research

Created 2 years ago

1,064 stars

Top 35.6% on SourcePulse

View on GitHub

6 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Vincent Weisser

Cofounder of Prime Intellect

Junyang Lin

Core Maintainer at Alibaba Qwen

and 2 more!

Project Summary

This repository provides code and tools for the automated interpretation of neurons within language models, specifically targeting researchers and practitioners interested in understanding model behavior. It enables the generation, simulation, and scoring of explanations for individual neuron activations, facilitating deeper insights into how large language models function.

How It Works

The project implements a methodology for explaining neuron behavior by simulating neuron activations across various input tokens and scoring these simulations. It leverages pre-computed datasets of GPT-2 XL and GPT-2 Small neuron activations and explanations, accessible via Azure Blob Storage. The core approach involves analyzing connection weights and activation patterns to identify influential tokens and related neurons, providing a structured way to probe and understand the function of specific model components.

Quick Start & Requirements

Install: The README does not provide a direct installation command but points to separate READMEs for neuron-explainer and neuron-viewer.
Prerequisites: Access to Azure Blob Storage credentials may be required. The specific datasets are hosted on Azure.
Resources: Requires access to potentially large datasets of neuron activations and explanations.

Highlighted Details

Provides public datasets for GPT-2 XL and GPT-2 Small neuron activations and explanations.
Includes tools for viewing neuron activations and explanations (neuron-viewer).
Defines methodologies for calculating neuron-neuron and neuron-token connection strengths.
Offers lists of "interesting" neurons identified by various criteria.

Maintenance & Community

The project is associated with OpenAI. Specific community channels or active maintenance signals are not detailed in the provided README.

Licensing & Compatibility

The licensing information is not explicitly stated in the provided README.

Limitations & Caveats

A bug in the GELU implementation used for GPT-2 series inference was discovered, leading to minor discrepancies in post-MLP activation values compared to the original implementation. The methodology for GPT-2 Small differs from GPT-2 XL, making direct comparison of results challenging.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days