inseq by inseq-team

Interpretability toolkit for sequence generation models

Created 4 years ago

451 stars

Top 66.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Alexander Borzunov

Research Scientist at OpenAI

Project Summary

Inseq is a Python toolkit for post-hoc interpretability analysis of sequence generation models, targeting researchers and practitioners in NLP. It democratizes access to various attribution methods, enabling deeper understanding of model behavior and facilitating reproducible research.

How It Works

Inseq integrates with Hugging Face Transformers, supporting both encoder-decoder and decoder-only architectures. It implements a wide range of attribution methods, including gradient-based (e.g., Integrated Gradients, DeepLIFT), attention-based, and perturbation-based techniques. The library allows for flexible post-processing of attribution maps via Aggregator classes and supports custom attribution targets using "step functions" to extract scores like logits, probabilities, or entropy at each generation step.

Quick Start & Requirements

Install latest stable version: pip install inseq
Install extras for visualization: pip install inseq[notebook,datasets]
Requires Python >= 3.10, <= 3.12.
tokenizers installation requires a Rust compiler.
sentencepiece installation may require cmake, build-essential, pkg-config.
Official Docs: https://inseq.org
Tutorial: https://github.com/inseq-team/inseq/blob/main/examples/inseq_tutorial.ipynb

Highlighted Details

Supports a broad spectrum of attribution methods, extending Captum's capabilities.
Offers visualization in notebooks, browsers, and the command line.
Includes a CLI for batch attribution on datasets and context dependence analysis.
Enables custom attribution targets and extraction of intermediate generation scores.

Maintenance & Community

Active development with a clear roadmap.
Community support via Discord: https://discord.gg/V5VgwwFPbu
Active presence on Twitter: https://twitter.com/InseqLib

Licensing & Compatibility

MIT License. Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

Python version compatibility is restricted to 3.10-3.12.
Installation of certain dependencies (tokenizers, sentencepiece) may require additional system-level setup.

Health Check

Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days