transformer-debugger  by openai

Tool for language model behavior investigation

Created 1 year ago
4,101 stars

Top 12.0% on SourcePulse

GitHubView on GitHub
Project Summary

The Transformer Debugger (TDB) is a tool from OpenAI's Superalignment team designed for investigating the behavior of small language models. It aids researchers and engineers in understanding "why" a model produces specific outputs by combining automated interpretability techniques with sparse autoencoders, enabling rapid, code-free exploration and intervention.

How It Works

TDB facilitates deep dives into model internals by identifying key components like neurons, attention heads, and autoencoder latents that drive specific behaviors. It automatically generates explanations for component activation and traces connections to reveal underlying circuits. This approach allows users to pinpoint causal relationships between model parts and observable outputs, answering questions about token prediction or attention patterns.

Quick Start & Requirements

  • Install via pip install -e . after cloning the repository.
  • Requires Python and Node.js/npm.
  • Setup involves installing the activation server backend and neuron viewer frontend separately.
  • See official documentation for detailed setup and usage.

Highlighted Details

  • Integrates automated interpretability with sparse autoencoders.
  • Enables intervention in the forward pass to observe behavioral effects.
  • Provides a React-based Neuron viewer for exploring model components.
  • Includes an activation server for inference and data serving.

Maintenance & Community

  • Developed by OpenAI's Superalignment team.
  • No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The tool is primarily focused on "small language models" and its applicability to larger, more complex architectures is not detailed. The README also lacks explicit licensing information, which may impact commercial adoption.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
4 more.

automated-interpretability by openai

0%
1k
Code and datasets for automated interpretability research
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Neel Nanda Neel Nanda(Research Scientist at Google DeepMind), and
1 more.

TransformerLens by TransformerLensOrg

0.6%
3k
Library for mechanistic interpretability research on GPT-style language models
Created 3 years ago
Updated 2 days ago
Feedback? Help us improve.