TransformerLens  by TransformerLensOrg

Library for mechanistic interpretability research on GPT-style language models

created 2 years ago
2,414 stars

Top 19.6% on sourcepulse

GitHubView on GitHub
Project Summary

TransformerLens is a Python library designed for mechanistic interpretability of GPT-style language models. It empowers researchers and practitioners to reverse-engineer the internal algorithms learned by these models by providing access to and manipulation of intermediate activations. The library facilitates in-depth analysis of model behavior, enabling a deeper understanding of how LLMs function.

How It Works

TransformerLens operates by allowing users to load various pre-trained transformer models and attach "hooks" to specific layers or components. These hooks can cache, modify, or replace activations as the model processes input. This fine-grained control over internal states enables techniques like activation patching and direct logit attribution, crucial for dissecting model computations and identifying the neural mechanisms responsible for specific behaviors.

Quick Start & Requirements

  • Primary install: pip install transformer_lens
  • Requirements: Python, PyTorch. Supports loading over 50 open-source language models.
  • Resources: Can be run on a single GPU or even CPU for smaller models, with many tutorials available in Colab notebooks.
  • Links: Introduction to the Library, Demos

Highlighted Details

  • Facilitates mechanistic interpretability research, with several papers published using the library.
  • Supports caching and editing of internal model activations via a hook system.
  • Offers a wide range of tutorials and examples for learning interpretability techniques.
  • Integrates with various open-source LLMs, including GPT-2 variants.

Maintenance & Community

  • Created by Neel Nanda, maintained by Bryce Meyer.
  • Active community on Slack for discussions and contributions.
  • Slack Community

Licensing & Compatibility

  • MIT License. Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

  • Primarily focused on GPT-style architectures; support for other architectures may be limited.
  • The field of mechanistic interpretability is nascent, with ongoing development and potential for breaking changes.
Health Check
Last commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
33
Issues (30d)
8
Star History
307 stars in the last 90 days

Explore Similar Projects

Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
created 3 years ago
updated 7 months ago
Starred by Dominik Moritz Dominik Moritz(Professor at CMU; ML Researcher at Apple), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
2 more.

ecco by jalammar

0%
2k
Python library for interactive NLP model visualization in Jupyter notebooks
created 4 years ago
updated 11 months ago
Starred by Anastasios Angelopoulos Anastasios Angelopoulos(Cofounder of LMArena), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

transformer-debugger by openai

0.1%
4k
Tool for language model behavior investigation
created 1 year ago
updated 1 year ago
Starred by Lilian Weng Lilian Weng(Cofounder of Thinking Machines Lab), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
42 more.

transformers by huggingface

0.2%
148k
ML library for pretrained model inference and training
created 6 years ago
updated 22 hours ago
Feedback? Help us improve.