Library for sparse autoencoders (SAEs) and transcoders on transformer activations
Top 55.5% on sourcepulse
This library trains sparse autoencoders (SAEs) and transcoders on HuggingFace language model activations, following the "Scaling and evaluating sparse autoencoders" paper. It targets researchers and practitioners interested in understanding and manipulating LLM internals, offering a lean, on-the-fly activation computation approach for scalability with zero storage overhead.
How It Works
The library employs a TopK activation function to directly enforce sparsity, differing from L1 penalty methods. This approach is claimed to be a Pareto improvement. SAEs are trained on model activations, with options for custom hookpoints beyond the default residual stream. Transcoders are also supported for mapping between different SAE representations.
Quick Start & Requirements
pip install eai-sparsify
transformers
, datasets
, torch
. GPU with CUDA is highly recommended for training and inference.python -m sparsify <model_name> [dataset_name]
Sae.load_from_hub("EleutherAI/sae-llama-3-8b-32x")
Highlighted Details
Maintenance & Community
Licensing & Compatibility
transformers
models.Limitations & Caveats
The library currently lacks activation caching, making hyperparameter tuning slower. Fine-grained control over learning rates or latent counts per hookpoint is not supported; global settings are applied. Distributed training requires the number of GPUs to evenly divide the number of layers being trained.
5 days ago
1 week