awesome-ml-model-compression  by cedrickchee

ML model compression resource list

Created 6 years ago
537 stars

Top 59.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository curates resources for machine learning model compression and acceleration, targeting researchers and engineers seeking to reduce model size, improve inference speed, and lower computational costs. It provides a comprehensive collection of papers, tools, and tutorials covering techniques like quantization, pruning, and distillation.

How It Works

The collection categorizes research and tools by compression technique, including quantization (low-bit precision), pruning (removing weights/neurons), distillation (transferring knowledge to smaller models), and low-rank approximation. It highlights papers and libraries that implement these methods, often with a focus on efficiency for mobile and edge devices, as well as recent advancements in LLM compression.

Quick Start & Requirements

  • Install/Run: No direct installation or execution commands are provided as this is a curated list of resources.
  • Prerequisites: Access to research papers, code repositories (often linked), and potentially specific ML frameworks (TensorFlow, PyTorch) for tool usage.
  • Resources: Links to various tools like TensorFlow Model Optimization Toolkit, Bitsandbytes, and XNNPACK are provided.

Highlighted Details

  • Extensive coverage of quantization techniques, including FP8, 4-bit precision (e.g., GPTQ, k-bit), and LLM-specific methods like SmoothQuant and ZeroQuant.
  • Detailed sections on pruning, including magnitude pruning, structured pruning, and recent one-shot methods like SparseGPT for LLMs.
  • Inclusion of parameter-efficient fine-tuning (PEFT) methods like LoRA and QLoRA for LLMs.
  • Links to practical guides and blog posts explaining quantization and compression for various models.

Maintenance & Community

  • The repository is maintained by Cedric Chee, with contributions welcomed via Pull Requests.
  • Links to related lists and specific code repositories (e.g., from Hugging Face, NVIDIA, MIT) are included.

Licensing & Compatibility

  • Code: MIT License.
  • Text content: Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
  • Generally permissive for research and commercial use, with attribution required for text content.

Limitations & Caveats

This is a curated list of research and tools, not a runnable software package. Users must independently evaluate and integrate the referenced papers and libraries. Some advanced techniques may require specific hardware (e.g., GPUs) or significant computational resources for implementation.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

neural-compressor by intel

0.2%
2k
Python library for model compression (quantization, pruning, distillation, NAS)
Created 5 years ago
Updated 16 hours ago
Feedback? Help us improve.