awesome-ml-model-compression by cedrickchee

ML model compression resource list

Created 7 years ago

540 stars

Top 58.8% on SourcePulse

Project Summary

This repository curates resources for machine learning model compression and acceleration, targeting researchers and engineers seeking to reduce model size, improve inference speed, and lower computational costs. It provides a comprehensive collection of papers, tools, and tutorials covering techniques like quantization, pruning, and distillation.

How It Works

The collection categorizes research and tools by compression technique, including quantization (low-bit precision), pruning (removing weights/neurons), distillation (transferring knowledge to smaller models), and low-rank approximation. It highlights papers and libraries that implement these methods, often with a focus on efficiency for mobile and edge devices, as well as recent advancements in LLM compression.

Quick Start & Requirements

Install/Run: No direct installation or execution commands are provided as this is a curated list of resources.
Prerequisites: Access to research papers, code repositories (often linked), and potentially specific ML frameworks (TensorFlow, PyTorch) for tool usage.
Resources: Links to various tools like TensorFlow Model Optimization Toolkit, Bitsandbytes, and XNNPACK are provided.

Highlighted Details

Extensive coverage of quantization techniques, including FP8, 4-bit precision (e.g., GPTQ, k-bit), and LLM-specific methods like SmoothQuant and ZeroQuant.
Detailed sections on pruning, including magnitude pruning, structured pruning, and recent one-shot methods like SparseGPT for LLMs.
Inclusion of parameter-efficient fine-tuning (PEFT) methods like LoRA and QLoRA for LLMs.
Links to practical guides and blog posts explaining quantization and compression for various models.

Maintenance & Community

The repository is maintained by Cedric Chee, with contributions welcomed via Pull Requests.
Links to related lists and specific code repositories (e.g., from Hugging Face, NVIDIA, MIT) are included.

Licensing & Compatibility

Code: MIT License.
Text content: Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
Generally permissive for research and commercial use, with attribution required for text content.

Limitations & Caveats

This is a curated list of research and tools, not a runnable software package. Users must independently evaluate and integrate the referenced papers and libraries. Some advanced techniques may require specific hardware (e.g., GPUs) or significant computational resources for implementation.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days