wanda by locuslab

LLM pruning research paper implementation

Created 2 years ago

838 stars

Top 42.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jared Palmer

SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX

Project Summary

Wanda is a PyTorch implementation for pruning Large Language Models (LLMs) by jointly considering weight magnitudes and activation norms. It targets researchers and practitioners aiming to reduce LLM size and computational cost while maintaining performance, offering a simple yet effective alternative to pure magnitude pruning.

How It Works

Wanda prunes weights on a per-output basis, calculating importance as the product of weight magnitudes and the L2 norm of input activations. This approach aims to identify and remove weights that are less critical to the model's output, potentially leading to better sparsity-accuracy trade-offs compared to methods relying solely on weight magnitudes.

Quick Start & Requirements

Install via pip install -e . (after cloning).
Requires PyTorch. CUDA is recommended for performance.
Example command: python main.py --model decapoda-research/llama-7b-hf --prune_method wanda --sparsity_ratio 0.5 --sparsity_type unstructured --save out/llama_7b/unstructured/wanda/
Official paper and project page links are provided.

Highlighted Details

Supports LLaMA and LLaMA-2 models.
Implements unstructured and structured (N:M) sparsity.
Includes code for ablation studies on weight updates and zero-shot evaluation via a modified LM Harness.
Benchmarks show Wanda outperforming magnitude pruning and SparseGPT on LLaMA-2 perplexity across various sparsity types.

Maintenance & Community

Last updated October 2023.
Issues and email are provided for discussion.

Licensing & Compatibility

MIT License.
Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The repository is built upon SparseGPT and focuses on LLM pruning; pruning for image classifiers is in a separate directory. Inference speedup for structured sparsity relies on PyTorch >= 2.1 and specific kernels (CUTLASS or CuSPARSELt).

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

14 stars in the last 30 days