sparsegpt by IST-DASLab

Code for massive language model one-shot pruning (ICML 2023 paper)

Created 2 years ago

863 stars

Top 41.6% on SourcePulse

View on GitHub

4 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Wing Lian

Founder of Axolotl AI

Jeff Hammerbacher

Cofounder of Cloudera

Jason Knight

Director AI Compilers at NVIDIA; Cofounder of OctoML

Project Summary

This repository provides code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot." It enables researchers and practitioners to reproduce results and apply one-shot pruning techniques to large language models like OPT, BLOOM, and LLaMA, achieving significant compression with minimal accuracy loss.

How It Works

SparseGPT implements a novel one-shot pruning algorithm that iteratively removes less important weights from large language models. It achieves high sparsity levels by minimizing the error introduced by pruning using a second-order information approximation, allowing for accurate model compression without extensive retraining. The implementation is based on the project's GPTQ code.

Quick Start & Requirements

Install: Clone the repository and install dependencies: pip install torch==1.10.1+cu111 transformers==4.21.2 datasets==1.17.0
Prerequisites: CUDA 11.1, PyTorch, Transformers, Datasets.
Demo: A Colab notebook is available for trying SparseGPT: demo.ipynb

Highlighted Details

Supports unstructured, n:m, and sparse + quantized pruning.
Enables pruning of OPT, BLOOM, and LLaMA models.
Evaluates pruned models on WikiText2, PTB, and C4 datasets.
Allows saving pruned model checkpoints and logging to Weights & Biases.

Maintenance & Community

The project is associated with the DASLab at IST. Further community engagement details are not specified in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

Some features are currently only available for OPT models, not BLOOM. Access to larger models like OPT-175B requires prior authorization from Meta and conversion to HuggingFace format.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

12 stars in the last 30 days