Awesome-LLM-Prune by pprp

LLM pruning techniques for model compression

Created 2 years ago

279 stars

Top 93.2% on SourcePulse

Project Summary

This repository is an "awesome list" curating research papers and code related to the pruning of Large Language Models (LLMs). It serves as a comprehensive resource for researchers and practitioners aiming to reduce model size and improve efficiency while maintaining or enhancing performance. The list categorizes pruning techniques and provides links to papers, code repositories, and summaries of their findings, facilitating a quick overview of the LLM pruning landscape.

How It Works

The repository compiles a wide array of LLM pruning methodologies, including unstructured, structured, and semi-structured approaches. It highlights techniques that focus on weight updates, activation-based metrics, symbolic discovery of pruning metrics, and the impact of pruning on various downstream tasks. The listed papers explore different strategies such as layer-wise pruning, block-wise adaptation, and gradient-free methods, often comparing their effectiveness against established techniques like SparseGPT and Wanda.

Quick Start & Requirements

This repository is a curated list of research papers and does not have a direct installation or execution command. Requirements would depend on the specific papers or code repositories linked within the list, which may include Python, specific deep learning frameworks (like PyTorch or TensorFlow), and potentially GPU acceleration.

Highlighted Details

Diverse Pruning Techniques: Covers a broad spectrum of pruning methods, from simple magnitude-based pruning to more sophisticated techniques like those employing genetic programming (Pruner-Zero) or analyzing weight importance based on input/output connections (RIA).
Performance Evaluation: Many papers benchmark their methods against established techniques like SparseGPT and Wanda, evaluating performance not just on perplexity but also on downstream tasks like reasoning, generation, and retrieval.
Novel Approaches: Features innovative methods such as "Junk DNA Hypothesis" challenging the redundancy of small-magnitude weights, "LLM-Kick" for comprehensive compression benchmarking, and "NASH" for accelerating encoder-decoder models.
Efficiency Focus: Several entries highlight memory efficiency and inference speedups, crucial for deploying LLMs on resource-constrained devices.

Maintenance & Community

The repository is an "awesome list," typically maintained by community contributions. Users are encouraged to submit pull requests or open issues for corrections, new papers, or discussions. Specific community channels like Discord or Slack are not mentioned in the provided README snippet.

Licensing & Compatibility

The repository itself, being a list of links and summaries, does not have a specific license. The licensing of the individual code repositories and papers linked within would vary and should be checked on their respective pages. Compatibility would depend on the specific tools and frameworks used in the cited research.

Limitations & Caveats

The README does not provide a unified framework for pruning but rather a collection of research papers. The effectiveness and applicability of each method can vary significantly depending on the specific LLM architecture, dataset, and downstream task. Some papers may have limitations such as requiring extensive computation for their search methods or focusing on specific model types (e.g., BERT instead of LLaMA).

Awesome-LLM-Prune by pprp

Explore Similar Projects

awesome-pruning by hrcheng1066

LLMPruner by yangjianxin1

PruneMe by arcee-ai

LLM-Reading-List by evanmiller

LLM-Shearing by princeton-nlp

nn_pruning by huggingface

Awesome-LLM-Compression by HuangOwen

sparsegpt by IST-DASLab

wanda by locuslab

LLM-Pruner by horseee

Awesome-Efficient-LLM by horseee

Torch-Pruning by VainF