BERT-of-Theseus  by JetRunner

PyTorch code for BERT compression via progressive module replacement

created 5 years ago
313 stars

Top 87.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official PyTorch implementation for BERT-of-Theseus, a method for compressing BERT models by progressively replacing modules. It targets NLP researchers and practitioners seeking to reduce model size and computational cost while maintaining performance, offering a novel approach to model compression.

How It Works

BERT-of-Theseus compresses BERT by iteratively replacing layers of a larger BERT model with smaller, potentially less capable ones, while fine-tuning the model on a downstream task. This progressive replacement strategy aims to identify and retain the most critical components, leading to a more efficient model. The process can be guided by a replacement scheduler or a constant replacement rate over a specified number of steps.

Quick Start & Requirements

  • Install: Clone and install the huggingface/transformers library.
  • Prerequisites: PyTorch, huggingface/transformers, GLUE dataset.
  • Usage: Run run_glue.py with specified arguments for compression, including --model_name_or_path to the predecessor model and --output_dir for the successor.
  • Pretrained Model: A 6-layer BERT-of-Theseus model pretrained on MNLI is available via canwenxu/BERT-of-Theseus-MNLI on Hugging Face Hub.
  • Documentation: Detailed argument descriptions are in the source code.

Highlighted Details

  • Achieves competitive performance on GLUE tasks, outperforming DistillBERT (with the same 6-layer structure) on six tasks.
  • Offers two compression strategies: a linear replacement scheduler and a constant replacement rate.
  • Provides a general-purpose 6-layer MNLI-pretrained model for transfer learning.
  • Supports bug reporting and contributions for adding more tasks.

Maintenance & Community

The project is associated with authors from EMNLP 2020. Community contributions are encouraged via pull requests and issue reports. Third-party implementations in TensorFlow and Keras are listed.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Currently, only GLUE tasks are supported. The README does not specify the exact license, which may impact commercial adoption.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.