transformer-debugger by openai

Tool for language model behavior investigation

Created 1 year ago

4,110 stars

Top 11.9% on SourcePulse

View on GitHub

9 Experts Love This Project

Anastasios Angelopoulos

Cofounder of LMArena

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Travis Fischer

Founder of Agentic

Shawn Wang

Editor of Latent Space

and 5 more!

Project Summary

The Transformer Debugger (TDB) is a tool from OpenAI's Superalignment team designed for investigating the behavior of small language models. It aids researchers and engineers in understanding "why" a model produces specific outputs by combining automated interpretability techniques with sparse autoencoders, enabling rapid, code-free exploration and intervention.

How It Works

TDB facilitates deep dives into model internals by identifying key components like neurons, attention heads, and autoencoder latents that drive specific behaviors. It automatically generates explanations for component activation and traces connections to reveal underlying circuits. This approach allows users to pinpoint causal relationships between model parts and observable outputs, answering questions about token prediction or attention patterns.

Quick Start & Requirements

Install via pip install -e . after cloning the repository.
Requires Python and Node.js/npm.
Setup involves installing the activation server backend and neuron viewer frontend separately.
See official documentation for detailed setup and usage.

Highlighted Details

Integrates automated interpretability with sparse autoencoders.
Enables intervention in the forward pass to observe behavioral effects.
Provides a React-based Neuron viewer for exploring model components.
Includes an activation server for inference and data serving.

Maintenance & Community

Developed by OpenAI's Superalignment team.
No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The tool is primarily focused on "small language models" and its applicability to larger, more complex architectures is not detailed. The README also lacks explicit licensing information, which may impact commercial adoption.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days