sparsegpt  by IST-DASLab

Code for massive language model one-shot pruning (ICML 2023 paper)

created 2 years ago
819 stars

Top 44.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot." It enables researchers and practitioners to reproduce results and apply one-shot pruning techniques to large language models like OPT, BLOOM, and LLaMA, achieving significant compression with minimal accuracy loss.

How It Works

SparseGPT implements a novel one-shot pruning algorithm that iteratively removes less important weights from large language models. It achieves high sparsity levels by minimizing the error introduced by pruning using a second-order information approximation, allowing for accurate model compression without extensive retraining. The implementation is based on the project's GPTQ code.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies: pip install torch==1.10.1+cu111 transformers==4.21.2 datasets==1.17.0
  • Prerequisites: CUDA 11.1, PyTorch, Transformers, Datasets.
  • Demo: A Colab notebook is available for trying SparseGPT: demo.ipynb

Highlighted Details

  • Supports unstructured, n:m, and sparse + quantized pruning.
  • Enables pruning of OPT, BLOOM, and LLaMA models.
  • Evaluates pruned models on WikiText2, PTB, and C4 datasets.
  • Allows saving pruned model checkpoints and logging to Weights & Biases.

Maintenance & Community

The project is associated with the DASLab at IST. Further community engagement details are not specified in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

Some features are currently only available for OPT models, not BLOOM. Access to larger models like OPT-175B requires prior authorization from Meta and conversion to HuggingFace format.

Health Check
Last commit

11 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
32 stars in the last 90 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX).

wanda by locuslab

0%
782
LLM pruning research paper implementation
created 2 years ago
updated 11 months ago
Feedback? Help us improve.