awesome-ml-model-compression  by cedrickchee

ML model compression resource list

created 6 years ago
527 stars

Top 60.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository curates resources for machine learning model compression and acceleration, targeting researchers and engineers seeking to reduce model size, improve inference speed, and lower computational costs. It provides a comprehensive collection of papers, tools, and tutorials covering techniques like quantization, pruning, and distillation.

How It Works

The collection categorizes research and tools by compression technique, including quantization (low-bit precision), pruning (removing weights/neurons), distillation (transferring knowledge to smaller models), and low-rank approximation. It highlights papers and libraries that implement these methods, often with a focus on efficiency for mobile and edge devices, as well as recent advancements in LLM compression.

Quick Start & Requirements

  • Install/Run: No direct installation or execution commands are provided as this is a curated list of resources.
  • Prerequisites: Access to research papers, code repositories (often linked), and potentially specific ML frameworks (TensorFlow, PyTorch) for tool usage.
  • Resources: Links to various tools like TensorFlow Model Optimization Toolkit, Bitsandbytes, and XNNPACK are provided.

Highlighted Details

  • Extensive coverage of quantization techniques, including FP8, 4-bit precision (e.g., GPTQ, k-bit), and LLM-specific methods like SmoothQuant and ZeroQuant.
  • Detailed sections on pruning, including magnitude pruning, structured pruning, and recent one-shot methods like SparseGPT for LLMs.
  • Inclusion of parameter-efficient fine-tuning (PEFT) methods like LoRA and QLoRA for LLMs.
  • Links to practical guides and blog posts explaining quantization and compression for various models.

Maintenance & Community

  • The repository is maintained by Cedric Chee, with contributions welcomed via Pull Requests.
  • Links to related lists and specific code repositories (e.g., from Hugging Face, NVIDIA, MIT) are included.

Licensing & Compatibility

  • Code: MIT License.
  • Text content: Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
  • Generally permissive for research and commercial use, with attribution required for text content.

Limitations & Caveats

This is a curated list of research and tools, not a runnable software package. Users must independently evaluate and integrate the referenced papers and libraries. Some advanced techniques may require specific hardware (e.g., GPUs) or significant computational resources for implementation.

Health Check
Last commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.