ecco by jalammar

Python library for interactive NLP model visualization in Jupyter notebooks

Created 5 years ago

2,071 stars

Top 21.3% on SourcePulse

View on GitHub

7 Experts Love This Project

Dominik Moritz

Research Scientist at Apple; Professor at CMU

Junyang Lin

Core Maintainer at Alibaba Qwen

Stas Bekman

Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake

Jeff Hammerbacher

Cofounder of Cloudera

and 3 more!

Project Summary

Ecco is a Python library designed for exploring and explaining the behavior of Transformer-based NLP models within Jupyter notebooks. It targets researchers and practitioners who need to understand how models like GPT2, BERT, and T5 arrive at their predictions, offering interactive visualizations to demystify complex internal workings.

How It Works

Ecco leverages PyTorch and Hugging Face's transformers library to provide a suite of tools for analyzing pre-trained models. Its core approach involves capturing and visualizing intermediate states, neuron activations, and feature attributions. Techniques like Integrated Gradients, DeepLift, and various Canonical Correlation Analysis (CCA) methods are employed to identify influential input tokens and uncover activation patterns within the model's feed-forward networks.

Quick Start & Requirements

Install via pip: pip install ecco or conda: conda install -c conda-forge ecco.
Requires Python and PyTorch. Compatible with Hugging Face transformers models.
Official documentation: ecco.readthedocs.io.
Examples and Colab notebooks are available.

Highlighted Details

Supports a wide range of Hugging Face models (GPT2, BERT, RoBERTa, T5, T0) and allows adding custom local models.
Integrates multiple feature attribution methods via Captum for detailed input-output analysis.
Visualizes neuron activation patterns using Non-negative Matrix Factorization (NMF) and compares activation spaces with SVCCA, PWCCA, and CKA.
Offers interactive "logit lens" visualizations to track token processing through model layers.

Maintenance & Community

Presented at ACL System Demonstrations 2021.
Discussion board and issue tracker available for help and bug reporting.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The library is currently in alpha release as a research project.
Focuses solely on exploring and understanding existing pre-trained models, not on training or fine-tuning.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days