TextPruner  by airaria

PyTorch toolkit for pruning pre-trained language models

created 4 years ago
387 stars

Top 75.2% on sourcepulse

GitHubView on GitHub
Project Summary

TextPruner is a PyTorch-based toolkit for efficiently reducing the size and inference time of pre-trained language models. It offers training-free, structured pruning methods for researchers and practitioners looking to deploy large NLP models in resource-constrained environments.

How It Works

TextPruner implements two primary pruning strategies: vocabulary pruning and transformer pruning. Vocabulary pruning removes underutilized tokens from the model's embedding layer and tokenizer, reducing model size and potentially speeding up masked language modeling tasks. Transformer pruning targets less important attention heads and feed-forward network (FFN) neurons within model layers, aiming to maintain performance while significantly shrinking the model. It supports both iterative pruning and mask-based approaches.

Quick Start & Requirements

  • Install via pip: pip install textpruner
  • Requirements: Python >= 3.7, torch >= 1.7, transformers >= 4.0, sentencepiece, protobuf.
  • Official documentation: https://textpruner.readthedocs.io

Highlighted Details

  • Supports vocabulary and transformer pruning, with a combined pipeline pruning option.
  • Compatible with Hugging Face Transformers models like BERT, ALBERT, RoBERTa, ELECTRA, and XLM-RoBERTa.
  • Offers both a Python package API and a CLI tool for ease of use.
  • Demonstrates significant speedups (up to 2x) and size reductions (e.g., 62.5% reduction in vocab size) with minimal accuracy loss on tasks like XNLI.

Maintenance & Community

  • The project's paper was accepted to ACL 2022 demo.
  • Associated with the HFL (Harbin Institute of Technology) research group.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Users should verify licensing terms before commercial use.

Limitations & Caveats

  • Does not support TensorFlow 2.
  • Transformer pruning is not supported for XLM, BART, T5, and mT5 models, though vocabulary pruning is.
  • Achieving optimal performance with transformer pruning may require careful tuning of parameters like n_iters and potentially using uneven head configurations.
Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX).

wanda by locuslab

0%
782
LLM pruning research paper implementation
created 2 years ago
updated 11 months ago
Feedback? Help us improve.