Awesome-LLM-Compression by HuangOwen

LLM compression papers and tools for efficient training/inference

Created 2 years ago

1,756 stars

Top 24.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Ying Sheng

Coauthor of SGLang

Project Summary

This repository serves as a curated collection of research papers and tools focused on Large Language Model (LLM) compression. It aims to provide a comprehensive resource for researchers and practitioners looking to accelerate LLM training and inference by reducing model size and computational requirements.

How It Works

The repository categorizes LLM compression techniques into several key areas: Quantization, Pruning and Sparsity, Distillation, Efficient Prompting, KV Cache Compression, and Other methods. Each category lists relevant research papers with links to their publications or code repositories, facilitating easy access to state-of-the-art techniques and their implementations.

Quick Start & Requirements

This repository is a curated list and does not have direct installation or execution commands. Users are expected to follow the links provided for individual papers and tools to access their respective requirements and setup instructions.

Highlighted Details

Extensive coverage of quantization techniques, including low-bit (e.g., 4-bit, 8-bit, 1-bit) quantization, mixed-precision, and outlier-aware methods.
Detailed sections on pruning and sparsity, covering structured and unstructured pruning, sensitivity analysis, and pruning-aware fine-tuning.
A thorough compilation of knowledge distillation methods for creating smaller, efficient LLMs.
Resources on efficient prompting strategies and KV cache compression techniques to optimize inference.

Maintenance & Community

The repository is actively maintained, with contributions welcomed from the community. It encourages users to add new papers and tools related to LLM compression, ensuring the list remains up-to-date with the latest advancements.

Licensing & Compatibility

The repository itself is a collection of links and does not impose a specific license. Users must adhere to the licenses of the individual papers and tools they access.

Limitations & Caveats

As a curated list, the repository does not provide direct tooling or implementation. Users must independently evaluate and integrate the referenced papers and tools, which may have varying levels of maturity, documentation, and compatibility.

Health Check

Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

26 stars in the last 30 days