wanda  by locuslab

LLM pruning research paper implementation

created 2 years ago
782 stars

Top 45.6% on sourcepulse

GitHubView on GitHub
Project Summary

Wanda is a PyTorch implementation for pruning Large Language Models (LLMs) by jointly considering weight magnitudes and activation norms. It targets researchers and practitioners aiming to reduce LLM size and computational cost while maintaining performance, offering a simple yet effective alternative to pure magnitude pruning.

How It Works

Wanda prunes weights on a per-output basis, calculating importance as the product of weight magnitudes and the L2 norm of input activations. This approach aims to identify and remove weights that are less critical to the model's output, potentially leading to better sparsity-accuracy trade-offs compared to methods relying solely on weight magnitudes.

Quick Start & Requirements

  • Install via pip install -e . (after cloning).
  • Requires PyTorch. CUDA is recommended for performance.
  • Example command: python main.py --model decapoda-research/llama-7b-hf --prune_method wanda --sparsity_ratio 0.5 --sparsity_type unstructured --save out/llama_7b/unstructured/wanda/
  • Official paper and project page links are provided.

Highlighted Details

  • Supports LLaMA and LLaMA-2 models.
  • Implements unstructured and structured (N:M) sparsity.
  • Includes code for ablation studies on weight updates and zero-shot evaluation via a modified LM Harness.
  • Benchmarks show Wanda outperforming magnitude pruning and SparseGPT on LLaMA-2 perplexity across various sparsity types.

Maintenance & Community

  • Last updated October 2023.
  • Issues and email are provided for discussion.

Licensing & Compatibility

  • MIT License.
  • Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The repository is built upon SparseGPT and focuses on LLM pruning; pruning for image classifiers is in a separate directory. Inference speedup for structured sparsity relies on PyTorch >= 2.1 and specific kernels (CUTLASS or CuSPARSELt).

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
1
Star History
39 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

applied-ai by pytorch-labs

0.3%
289
Applied AI experiments and examples for PyTorch
created 2 years ago
updated 2 months ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
10 more.

qlora by artidoro

0.2%
11k
Finetuning tool for quantized LLMs
created 2 years ago
updated 1 year ago
Feedback? Help us improve.