Tool for neural network circuit discovery
Top 20.9% on sourcepulse
This library provides tools for finding, visualizing, and intervening on neural network "circuits" using cross-layer MLP transcoders. It's designed for researchers and practitioners in mechanistic interpretability seeking to understand model behavior by tracing feature activations and their causal effects.
How It Works
The library implements a three-step process: 1. Attribution: Computes the direct effect of input tokens, transcoder features, and error nodes on other features and output logits using MLP transcoders. 2. Graph Creation: Prunes the attribution graph based on influence thresholds and converts it to a JSON format for visualization. 3. Visualization & Intervention: Hosts a local web server to display and interact with the graph, allowing users to annotate features and perform interventions by setting transcoder features to specific values.
Quick Start & Requirements
pip install .
after cloning the repository.circuit-tracer attribute --prompt "..." --transcoder_set gemma --slug gemma-demo --graph_file_dir ./graph_files --server
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 days ago
Inactive