dictionary_learning by saprmarks

Sparse autoencoder research code for neural network activations

Created 2 years ago

383 stars

Top 74.7% on SourcePulse

1 Expert Loves This Project

evhub

Head of Alignment Stress-Testing at Anthropic

Project Summary

This repository provides tools for training and evaluating sparse autoencoders (SAEs) on neural network activations, primarily for interpretability research. It targets researchers and practitioners working with large language models who want to understand and manipulate internal representations. The library offers a flexible framework for various SAE architectures and training protocols, along with pre-trained dictionaries for the Pythia-70m-deduped model.

How It Works

The library implements several SAE architectures (standard, Gated, TopK, BatchTopK, JumpReLU) each with a corresponding trainer. It utilizes an ActivationBuffer to efficiently collect and batch activations from specified model submodules using the nnsight library. Training protocols include options for L1 regularization, neuron resampling, learning rate warmup/decay, and sparsity penalty warmup. Activations can be normalized for better hyperparameter transfer.

Quick Start & Requirements

Install via pip: pip install dictionary-learning
Requires Python and PyTorch. GPU with CUDA is recommended for training.
Pre-trained dictionaries are available for download (~2.5 GB).
See nnsight demo for integration.

Highlighted Details

Supports multiple SAE architectures and training protocols.
Provides an ActivationBuffer for efficient data handling.
Includes a script for downloading pre-trained dictionaries for Pythia-70m-deduped.
Offers detailed evaluation metrics and benchmarks for pre-trained dictionaries.
Experimental features like MLP stretchers and entropy regularization are included.

Maintenance & Community

Developed by Samuel Marks, Adam Karvonen, and Aaron Mueller.
nnsight package is under active development and may have breaking changes.

Licensing & Compatibility

The repository is licensed under the MIT License.
Compatible with commercial use.

Limitations & Caveats

nnsight is under active development, potentially leading to breaking changes.
Limited support for converting SAEs from sae_lens (currently only JumpReLU).
Experimental features may be deprecated.

Health Check

Last Commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)

0

Issues (30d)

1

Star History

15 stars in the last 30 days

Explore Similar Projects

Starred by

Sebastian Raschka

Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)").

mint by dpressel

Minimal PyTorch library for Transformer tutorials

Created 3 years ago

Updated 3 years ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI).

native-sparse-attention-triton by XunhaoLai

Efficient sparse attention for LLMs

Created 10 months ago

Updated 7 months ago

Starred by

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n),

Edward Sun

Edward Sun(Research Scientist at Meta Superintelligence Lab), and

5 more.

attorch by BobMcDear

PyTorch nn module subset, implemented in Python using Triton

Created 2 years ago

Updated 5 months ago

sparse_coding by HoagyC

Sparse coding for interpretable language model features

Created 2 years ago

Updated 2 years ago

torch-conv-kan by IvanDrokin

PyTorch implementation for convolutional Kolmogorov-Arnold Networks research

Created 1 year ago

Updated 1 year ago

segformer-pytorch by bubbliiiing

PyTorch code for SegFormer semantic segmentation

Created 3 years ago

Updated 2 years ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic),

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect), and

1 more.

sparse_autoencoder by openai

Sparse autoencoder for GPT2-small activation analysis

Created 1 year ago

Updated 1 year ago

dl_note by harleyszhang

Deep learning notes covering fundamentals, optimization, and deployment

Created 3 years ago

Updated 1 month ago

DyT by jiachenzhu

PyTorch code for a CVPR 2025 research paper

Created 10 months ago

Updated 9 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind), and

1 more.

sparsify by EleutherAI

Library for sparse autoencoders (SAEs) and transcoders on transformer activations

Created 1 year ago

Updated 2 weeks ago

neural-api by joaopauloschuler

Pascal-based deep learning API for AVX/OpenCL-capable devices

Created 6 years ago

Updated 2 weeks ago

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral).

gen-efficientnet-pytorch by rwightman

PyTorch image models for efficient architectures

Created 6 years ago

Updated 1 year ago

Feedback? Help us improve.