automated-interpretability  by openai

Code and datasets for automated interpretability research

created 2 years ago
1,026 stars

Top 37.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code and tools for the automated interpretation of neurons within language models, specifically targeting researchers and practitioners interested in understanding model behavior. It enables the generation, simulation, and scoring of explanations for individual neuron activations, facilitating deeper insights into how large language models function.

How It Works

The project implements a methodology for explaining neuron behavior by simulating neuron activations across various input tokens and scoring these simulations. It leverages pre-computed datasets of GPT-2 XL and GPT-2 Small neuron activations and explanations, accessible via Azure Blob Storage. The core approach involves analyzing connection weights and activation patterns to identify influential tokens and related neurons, providing a structured way to probe and understand the function of specific model components.

Quick Start & Requirements

  • Install: The README does not provide a direct installation command but points to separate READMEs for neuron-explainer and neuron-viewer.
  • Prerequisites: Access to Azure Blob Storage credentials may be required. The specific datasets are hosted on Azure.
  • Resources: Requires access to potentially large datasets of neuron activations and explanations.

Highlighted Details

  • Provides public datasets for GPT-2 XL and GPT-2 Small neuron activations and explanations.
  • Includes tools for viewing neuron activations and explanations (neuron-viewer).
  • Defines methodologies for calculating neuron-neuron and neuron-token connection strengths.
  • Offers lists of "interesting" neurons identified by various criteria.

Maintenance & Community

The project is associated with OpenAI. Specific community channels or active maintenance signals are not detailed in the provided README.

Licensing & Compatibility

The licensing information is not explicitly stated in the provided README.

Limitations & Caveats

A bug in the GELU implementation used for GPT-2 series inference was discovered, leading to minor discrepancies in post-MLP activation values compared to the original implementation. The methodology for GPT-2 Small differs from GPT-2 XL, making direct comparison of results challenging.

Health Check
Last commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
26 stars in the last 90 days

Explore Similar Projects

Starred by Dominik Moritz Dominik Moritz(Professor at CMU; ML Researcher at Apple), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
2 more.

ecco by jalammar

0%
2k
Python library for interactive NLP model visualization in Jupyter notebooks
created 4 years ago
updated 11 months ago
Starred by Anastasios Angelopoulos Anastasios Angelopoulos(Cofounder of LMArena), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

transformer-debugger by openai

0.1%
4k
Tool for language model behavior investigation
created 1 year ago
updated 1 year ago
Feedback? Help us improve.