Code for massive language model one-shot pruning (ICML 2023 paper)
Top 44.2% on sourcepulse
This repository provides code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot." It enables researchers and practitioners to reproduce results and apply one-shot pruning techniques to large language models like OPT, BLOOM, and LLaMA, achieving significant compression with minimal accuracy loss.
How It Works
SparseGPT implements a novel one-shot pruning algorithm that iteratively removes less important weights from large language models. It achieves high sparsity levels by minimizing the error introduced by pruning using a second-order information approximation, allowing for accurate model compression without extensive retraining. The implementation is based on the project's GPTQ code.
Quick Start & Requirements
pip install torch==1.10.1+cu111 transformers==4.21.2 datasets==1.17.0
Highlighted Details
Maintenance & Community
The project is associated with the DASLab at IST. Further community engagement details are not specified in the README.
Licensing & Compatibility
The repository does not explicitly state a license. Users should verify licensing for commercial use or integration into closed-source projects.
Limitations & Caveats
Some features are currently only available for OPT models, not BLOOM. Access to larger models like OPT-175B requires prior authorization from Meta and conversion to HuggingFace format.
11 months ago
1 week