nnsight  by ndif-team

SDK for interpreting/manipulating deep model internals

created 1 year ago
617 stars

Top 54.2% on sourcepulse

GitHubView on GitHub
Project Summary

This package provides a Python API for interpreting and manipulating the internal states of deep learning models, particularly large language models. It targets researchers and developers who need to understand, debug, or modify model behavior at a granular level, offering a powerful tool for mechanistic interpretability.

How It Works

nnsight operates by creating a computational graph of model operations within a tracing context. Users define interventions or data extraction points using proxy objects that represent model outputs or intermediate states. These proxies are then compiled into an executable graph, allowing for efficient execution and modification of model forward passes. This approach enables fine-grained control and observation without requiring direct modification of the underlying model code.

Quick Start & Requirements

  • Install via pip: pip install nnsight
  • Requires Python and PyTorch. GPU with CUDA is recommended for performance.
  • Example usage and detailed documentation are available at nnsight.net.

Highlighted Details

  • Enables direct manipulation of model activations (e.g., adding noise).
  • Supports multi-token generation with per-token intervention.
  • Allows cross-prompt interventions by reusing computed states.
  • Facilitates ad-hoc module application and chaining.

Maintenance & Community

The project is associated with the nndif-team and has a published paper. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The license is not explicitly stated in the README. Compatibility with commercial or closed-source projects would depend on the specific license terms.

Limitations & Caveats

The README focuses on demonstrating capabilities with GPT-2. Support for other model architectures or frameworks may vary. The library is relatively new, and extensive community support or long-term maintenance guarantees are not detailed.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
14
Issues (30d)
50
Star History
65 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.