sparsegpt  by IST-DASLab

Code for massive language model one-shot pruning (ICML 2023 paper)

Created 2 years ago
863 stars

Top 41.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot." It enables researchers and practitioners to reproduce results and apply one-shot pruning techniques to large language models like OPT, BLOOM, and LLaMA, achieving significant compression with minimal accuracy loss.

How It Works

SparseGPT implements a novel one-shot pruning algorithm that iteratively removes less important weights from large language models. It achieves high sparsity levels by minimizing the error introduced by pruning using a second-order information approximation, allowing for accurate model compression without extensive retraining. The implementation is based on the project's GPTQ code.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies: pip install torch==1.10.1+cu111 transformers==4.21.2 datasets==1.17.0
  • Prerequisites: CUDA 11.1, PyTorch, Transformers, Datasets.
  • Demo: A Colab notebook is available for trying SparseGPT: demo.ipynb

Highlighted Details

  • Supports unstructured, n:m, and sparse + quantized pruning.
  • Enables pruning of OPT, BLOOM, and LLaMA models.
  • Evaluates pruned models on WikiText2, PTB, and C4 datasets.
  • Allows saving pruned model checkpoints and logging to Weights & Biases.

Maintenance & Community

The project is associated with the DASLab at IST. Further community engagement details are not specified in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

Some features are currently only available for OPT models, not BLOOM. Access to larger models like OPT-175B requires prior authorization from Meta and conversion to HuggingFace format.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Dan Guido Dan Guido(Cofounder of Trail of Bits), and
6 more.

llm-compressor by vllm-project

1.6%
3k
Transformers-compatible library for LLM compression, optimized for vLLM deployment
Created 1 year ago
Updated 19 hours ago
Feedback? Help us improve.