bertviz by jessevig

Interactive tool for visualizing attention in Transformer language models

Created 7 years ago

7,864 stars

Top 6.6% on SourcePulse

View on GitHub

11 Experts Love This Project

Clement Delangue

Cofounder of Hugging Face

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Vincent Weisser

Cofounder of Prime Intellect

Alexander Borzunov

Research Scientist at OpenAI

and 7 more!

Project Summary

BertViz is an interactive visualization tool for understanding attention mechanisms in Transformer-based NLP models like BERT, GPT-2, and BART. It offers multiple views (Head, Model, Neuron) to analyze attention patterns, aiding researchers and practitioners in debugging and interpreting model behavior.

How It Works

BertViz leverages the Tensor2Tensor visualization tool, extending it with distinct views to dissect attention. The Head View displays attention for specific heads within a layer, the Model View provides a global overview across all layers and heads, and the Neuron View visualizes individual neuron contributions to attention computation. This multi-faceted approach offers a comprehensive understanding of how attention distributes information.

Quick Start & Requirements

Install via pip: pip install bertviz
Requires Jupyter Notebook and ipywidgets: pip install jupyterlab ipywidgets
Supports Hugging Face models; requires output_attentions=True when loading models.
Colab integration is straightforward with !pip install bertviz.
Colab Tutorial
Documentation

Highlighted Details

Supports BERT, GPT-2, BART, T5, and other Hugging Face models.
Neuron View requires custom model versions included with BertViz for access to query/key vectors.
Visualizations can be returned as HTML objects for saving or custom embedding.
Supports visualizing attention for sentence pairs and encoder-decoder models.

Maintenance & Community

Developed by Jesse Vig.
Project is active and has been cited in research.

Licensing & Compatibility

Licensed under Apache 2.0.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The tool may perform slowly with very long inputs or large models; filtering layers is recommended. Some Colab visualizations may fail with long inputs due to runtime disconnections. The Neuron View is limited to specific custom BERT, GPT-2, and RoBERTa models included with the tool. Attention visualization does not directly equate to prediction explanation.

Health Check

Last Commit

2 days ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

49 stars in the last 30 days