Awesome-LLM-Prune  by pprp

LLM pruning techniques for model compression

Created 1 year ago
257 stars

Top 98.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository is an "awesome list" curating research papers and code related to the pruning of Large Language Models (LLMs). It serves as a comprehensive resource for researchers and practitioners aiming to reduce model size and improve efficiency while maintaining or enhancing performance. The list categorizes pruning techniques and provides links to papers, code repositories, and summaries of their findings, facilitating a quick overview of the LLM pruning landscape.

How It Works

The repository compiles a wide array of LLM pruning methodologies, including unstructured, structured, and semi-structured approaches. It highlights techniques that focus on weight updates, activation-based metrics, symbolic discovery of pruning metrics, and the impact of pruning on various downstream tasks. The listed papers explore different strategies such as layer-wise pruning, block-wise adaptation, and gradient-free methods, often comparing their effectiveness against established techniques like SparseGPT and Wanda.

Quick Start & Requirements

This repository is a curated list of research papers and does not have a direct installation or execution command. Requirements would depend on the specific papers or code repositories linked within the list, which may include Python, specific deep learning frameworks (like PyTorch or TensorFlow), and potentially GPU acceleration.

Highlighted Details

  • Diverse Pruning Techniques: Covers a broad spectrum of pruning methods, from simple magnitude-based pruning to more sophisticated techniques like those employing genetic programming (Pruner-Zero) or analyzing weight importance based on input/output connections (RIA).
  • Performance Evaluation: Many papers benchmark their methods against established techniques like SparseGPT and Wanda, evaluating performance not just on perplexity but also on downstream tasks like reasoning, generation, and retrieval.
  • Novel Approaches: Features innovative methods such as "Junk DNA Hypothesis" challenging the redundancy of small-magnitude weights, "LLM-Kick" for comprehensive compression benchmarking, and "NASH" for accelerating encoder-decoder models.
  • Efficiency Focus: Several entries highlight memory efficiency and inference speedups, crucial for deploying LLMs on resource-constrained devices.

Maintenance & Community

The repository is an "awesome list," typically maintained by community contributions. Users are encouraged to submit pull requests or open issues for corrections, new papers, or discussions. Specific community channels like Discord or Slack are not mentioned in the provided README snippet.

Licensing & Compatibility

The repository itself, being a list of links and summaries, does not have a specific license. The licensing of the individual code repositories and papers linked within would vary and should be checked on their respective pages. Compatibility would depend on the specific tools and frameworks used in the cited research.

Limitations & Caveats

The README does not provide a unified framework for pruning but rather a collection of research papers. The effectiveness and applicability of each method can vary significantly depending on the specific LLM architecture, dataset, and downstream task. Some papers may have limitations such as requiring extensive computation for their search methods or focusing on specific model types (e.g., BERT instead of LLaMA).

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
28
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
2 more.

sparsegpt by IST-DASLab

0.5%
836
Code for massive language model one-shot pruning (ICML 2023 paper)
Created 2 years ago
Updated 1 year ago
Starred by Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX).

wanda by locuslab

0.4%
802
LLM pruning research paper implementation
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.