Awesome-LLM-Compression  by HuangOwen

LLM compression papers and tools for efficient training/inference

created 2 years ago
1,618 stars

Top 26.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository serves as a curated collection of research papers and tools focused on Large Language Model (LLM) compression. It aims to provide a comprehensive resource for researchers and practitioners looking to accelerate LLM training and inference by reducing model size and computational requirements.

How It Works

The repository categorizes LLM compression techniques into several key areas: Quantization, Pruning and Sparsity, Distillation, Efficient Prompting, KV Cache Compression, and Other methods. Each category lists relevant research papers with links to their publications or code repositories, facilitating easy access to state-of-the-art techniques and their implementations.

Quick Start & Requirements

This repository is a curated list and does not have direct installation or execution commands. Users are expected to follow the links provided for individual papers and tools to access their respective requirements and setup instructions.

Highlighted Details

  • Extensive coverage of quantization techniques, including low-bit (e.g., 4-bit, 8-bit, 1-bit) quantization, mixed-precision, and outlier-aware methods.
  • Detailed sections on pruning and sparsity, covering structured and unstructured pruning, sensitivity analysis, and pruning-aware fine-tuning.
  • A thorough compilation of knowledge distillation methods for creating smaller, efficient LLMs.
  • Resources on efficient prompting strategies and KV cache compression techniques to optimize inference.

Maintenance & Community

The repository is actively maintained, with contributions welcomed from the community. It encourages users to add new papers and tools related to LLM compression, ensuring the list remains up-to-date with the latest advancements.

Licensing & Compatibility

The repository itself is a collection of links and does not impose a specific license. Users must adhere to the licenses of the individual papers and tools they access.

Limitations & Caveats

As a curated list, the repository does not provide direct tooling or implementation. Users must independently evaluate and integrate the referenced papers and tools, which may have varying levels of maturity, documentation, and compatibility.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
0
Star History
133 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.