dictionary_learning  by saprmarks

Sparse autoencoder research code for neural network activations

created 1 year ago
322 stars

Top 85.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides tools for training and evaluating sparse autoencoders (SAEs) on neural network activations, primarily for interpretability research. It targets researchers and practitioners working with large language models who want to understand and manipulate internal representations. The library offers a flexible framework for various SAE architectures and training protocols, along with pre-trained dictionaries for the Pythia-70m-deduped model.

How It Works

The library implements several SAE architectures (standard, Gated, TopK, BatchTopK, JumpReLU) each with a corresponding trainer. It utilizes an ActivationBuffer to efficiently collect and batch activations from specified model submodules using the nnsight library. Training protocols include options for L1 regularization, neuron resampling, learning rate warmup/decay, and sparsity penalty warmup. Activations can be normalized for better hyperparameter transfer.

Quick Start & Requirements

  • Install via pip: pip install dictionary-learning
  • Requires Python and PyTorch. GPU with CUDA is recommended for training.
  • Pre-trained dictionaries are available for download (~2.5 GB).
  • See nnsight demo for integration.

Highlighted Details

  • Supports multiple SAE architectures and training protocols.
  • Provides an ActivationBuffer for efficient data handling.
  • Includes a script for downloading pre-trained dictionaries for Pythia-70m-deduped.
  • Offers detailed evaluation metrics and benchmarks for pre-trained dictionaries.
  • Experimental features like MLP stretchers and entropy regularization are included.

Maintenance & Community

  • Developed by Samuel Marks, Adam Karvonen, and Aaron Mueller.
  • nnsight package is under active development and may have breaking changes.

Licensing & Compatibility

  • The repository is licensed under the MIT License.
  • Compatible with commercial use.

Limitations & Caveats

  • nnsight is under active development, potentially leading to breaking changes.
  • Limited support for converting SAEs from sae_lens (currently only JumpReLU).
  • Experimental features may be deprecated.
Health Check
Last commit

2 weeks ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
44 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.