LLM-Pruner  by horseee

LLM structural pruner for model compression

created 2 years ago
1,048 stars

Top 36.5% on sourcepulse

GitHubView on GitHub
Project Summary

LLM-Pruner offers structural pruning for large language models, enabling significant compression with minimal performance degradation. It targets researchers and practitioners aiming to reduce the computational footprint of LLMs like Llama, BLOOM, and Vicuna, facilitating deployment on resource-constrained environments.

How It Works

LLM-Pruner employs a three-stage process: Discovery, Estimation, and Recovery. The Discovery stage identifies minimally-removable structural units within the LLM. The Estimation stage quantizes the importance of these units using criteria like Taylor expansion or L1/L2 norms. Finally, the Recovery stage uses efficient post-training on datasets like Alpaca or LaMini-instruction to restore model performance. This approach allows for task-agnostic compression and efficient fine-tuning.

Quick Start & Requirements

  • Install via pip install -r requirement.txt.
  • Requires Python, PyTorch, and lm-evaluation-harness. GPU is recommended for Taylor-based pruning and evaluation.
  • The script/llama_prune.sh script automates downloading models and datasets for a minimal example.
  • Official quick-start and detailed instructions are available in the README.

Highlighted Details

  • Supports a wide range of LLMs including Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, and Baichuan.
  • Achieves efficient compression, with reported pruning and post-training times of 3 minutes and 3 hours, respectively.
  • Demonstrates competitive performance, with a fine-tuned LLaMA-5.4B model approaching the original LLaMA-7B's accuracy using only 50k samples.
  • Includes support for Grouped Query Attention (GQA) for Llama-3/3.1.

Maintenance & Community

  • The project is associated with the National University of Singapore.
  • Updates are regularly posted, indicating active development.
  • A WeChat group is available for community discussion.

Licensing & Compatibility

  • The project is released under a permissive license, allowing for commercial use and integration with closed-source projects.
  • Specific model checkpoints used in experiments might have different licensing terms.

Limitations & Caveats

The project notes that while efficient, the compressed models can still exhibit issues like repetitive token generation or nonsensical outputs, indicating room for quality improvement. Manual intervention may be required for certain model architectures to map index concatenations.

Health Check
Last commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
43 stars in the last 90 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX).

wanda by locuslab

0%
782
LLM pruning research paper implementation
created 2 years ago
updated 11 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

yarn by jquesnelle

1.0%
2k
Context window extension method for LLMs (research paper, models)
created 2 years ago
updated 1 year ago
Feedback? Help us improve.