sparsify by EleutherAI

Library for sparse autoencoders (SAEs) and transcoders on transformer activations

Created 1 year ago

681 stars

Top 49.9% on SourcePulse

View on GitHub

3 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Omar Sanseviero

DevRel at Google DeepMind

Vincent Weisser

Cofounder of Prime Intellect

Project Summary

This library trains sparse autoencoders (SAEs) and transcoders on HuggingFace language model activations, following the "Scaling and evaluating sparse autoencoders" paper. It targets researchers and practitioners interested in understanding and manipulating LLM internals, offering a lean, on-the-fly activation computation approach for scalability with zero storage overhead.

How It Works

The library employs a TopK activation function to directly enforce sparsity, differing from L1 penalty methods. This approach is claimed to be a Pareto improvement. SAEs are trained on model activations, with options for custom hookpoints beyond the default residual stream. Transcoders are also supported for mapping between different SAE representations.

Quick Start & Requirements

Install: pip install eai-sparsify
Requirements: Python, HuggingFace transformers, datasets, torch. GPU with CUDA is highly recommended for training and inference.
Training: python -m sparsify <model_name> [dataset_name]
Loading Pretrained SAEs: Sae.load_from_hub("EleutherAI/sae-llama-3-8b-32x")
Documentation: https://github.com/EleutherAI/sparsify (Implicitly linked via GitHub repo)

Highlighted Details

Trains SAEs on model activations without disk caching, enabling large-scale training with zero storage overhead.
Supports custom hookpoint patterns for targeting specific model submodules (e.g., attention or MLP layers).
Offers distributed training capabilities, including module distribution across GPUs for memory efficiency.
Includes experimental features like linear k decay, GroupMax activation, and end-to-end training with CE or KL loss.

Maintenance & Community

Developed by EleutherAI.
Collaboration and discussion are encouraged in the sparse-autoencoders channel on the EleutherAI Discord server.

Licensing & Compatibility

The library itself appears to be MIT licensed based on typical EleutherAI projects, but the README does not explicitly state the license.
Compatible with HuggingFace transformers models.

Limitations & Caveats

The library currently lacks activation caching, making hyperparameter tuning slower. Fine-grained control over learning rates or latent counts per hookpoint is not supported; global settings are applied. Distributed training requires the number of GPUs to evenly divide the number of layers being trained.

Health Check

Last Commit

2 weeks ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days