sae_vis  by callummcdougall

Visualizations for sparse autoencoders

Created 2 years ago
254 stars

Top 99.1% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This repository provides visualization tools for Sparse Autoencoders (SAEs), enabling researchers and engineers to analyze and understand the internal workings of these models. It offers feature-centric and prompt-centric views, replicating visualizations from Anthropic's published research, thereby facilitating deeper model interpretability and diagnostic capabilities for SAEs.

How It Works

The library offers two primary visualization modes designed for dissecting SAE behavior. The feature-centric view allows users to inspect individual features, identifying specific tokens or sequences from a dataset that maximally activate them, providing insight into what each feature "detects." Conversely, the prompt-centric view analyzes custom prompts, revealing which features are most influential for a given input according to various metrics, such as activation magnitude or impact on token prediction. This dual approach provides complementary perspectives for understanding SAEs' representational space and functional roles.

Quick Start & Requirements

Installation is straightforward via pip: pip install sae-vis. The project utilizes Poetry for dependency management, requiring poetry install after cloning the repository to set up the development environment. While no specific hardware prerequisites like GPUs are explicitly listed, a standard Python 3 environment is assumed. A demo Colab notebook is available, with its complete code included in the repository for reproduction and experimentation. Links to the PyPI package page and the original Anthropic visualizations are mentioned within the documentation.

Highlighted Details

  • Directly replicates and extends visualization techniques pioneered in Anthropic's SAE research.
  • Supports distinct feature-centric and prompt-centric analysis perspectives, offering complementary views of SAE functionality.
  • Version 0.3.0 introduced a significant refactor, enhancing capabilities with support for OthelloGPT SAEs, linear probes (input/output space), attention output SAEs, and detailed token-level visualizations, including the change in correct-token probability upon feature ablation.
  • Designed for compatibility and integration with the sae-lens library, a related project.

Maintenance & Community

The project is no longer actively maintained by its original author, who has shifted focus to a new role. However, the author remains open to accepting community contributions via Pull Requests (PRs). For users seeking more extensive development, ongoing iteration, and a broader suite of tools for working with SAEs, the SAELens library is explicitly recommended, as it builds upon and forks this repository.

Licensing & Compatibility

The specific open-source license governing this repository is not explicitly stated in the provided README text. This omission necessitates that potential adopters seek clarification regarding usage rights, particularly concerning commercial applications, derivative works, or integration into closed-source projects.

Limitations & Caveats

The primary limitation is the lack of active maintenance, meaning future updates, bug fixes, or feature enhancements are not guaranteed. Users are directed to the SAELens library for more current development and a more comprehensive feature set. Dependency management via Poetry may present a minor adoption hurdle for users unfamiliar with the tool compared to standard pip-based workflows.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
1 more.

OmniXAI by salesforce

0%
966
Python library for explainable AI (XAI)
Created 4 years ago
Updated 1 year ago
Starred by Anastasios Angelopoulos Anastasios Angelopoulos(Cofounder of LMArena), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

transformer-debugger by openai

0.1%
4k
Tool for language model behavior investigation
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Gabriel Almeida Gabriel Almeida(Cofounder of Langflow), and
5 more.

lit by PAIR-code

0.1%
4k
Interactive ML model analysis tool for understanding model behavior
Created 5 years ago
Updated 2 weeks ago
Feedback? Help us improve.