wanda  by locuslab

LLM pruning research paper implementation

Created 2 years ago
802 stars

Top 44.0% on SourcePulse

GitHubView on GitHub
Project Summary

Wanda is a PyTorch implementation for pruning Large Language Models (LLMs) by jointly considering weight magnitudes and activation norms. It targets researchers and practitioners aiming to reduce LLM size and computational cost while maintaining performance, offering a simple yet effective alternative to pure magnitude pruning.

How It Works

Wanda prunes weights on a per-output basis, calculating importance as the product of weight magnitudes and the L2 norm of input activations. This approach aims to identify and remove weights that are less critical to the model's output, potentially leading to better sparsity-accuracy trade-offs compared to methods relying solely on weight magnitudes.

Quick Start & Requirements

  • Install via pip install -e . (after cloning).
  • Requires PyTorch. CUDA is recommended for performance.
  • Example command: python main.py --model decapoda-research/llama-7b-hf --prune_method wanda --sparsity_ratio 0.5 --sparsity_type unstructured --save out/llama_7b/unstructured/wanda/
  • Official paper and project page links are provided.

Highlighted Details

  • Supports LLaMA and LLaMA-2 models.
  • Implements unstructured and structured (N:M) sparsity.
  • Includes code for ablation studies on weight updates and zero-shot evaluation via a modified LM Harness.
  • Benchmarks show Wanda outperforming magnitude pruning and SparseGPT on LLaMA-2 perplexity across various sparsity types.

Maintenance & Community

  • Last updated October 2023.
  • Issues and email are provided for discussion.

Licensing & Compatibility

  • MIT License.
  • Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The repository is built upon SparseGPT and focuses on LLM pruning; pruning for image classifiers is in a separate directory. Inference speedup for structured sparsity relies on PyTorch >= 2.1 and specific kernels (CUTLASS or CuSPARSELt).

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
2 more.

sparsegpt by IST-DASLab

0.5%
836
Code for massive language model one-shot pruning (ICML 2023 paper)
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.