ML model compression resource list
Top 60.8% on sourcepulse
This repository curates resources for machine learning model compression and acceleration, targeting researchers and engineers seeking to reduce model size, improve inference speed, and lower computational costs. It provides a comprehensive collection of papers, tools, and tutorials covering techniques like quantization, pruning, and distillation.
How It Works
The collection categorizes research and tools by compression technique, including quantization (low-bit precision), pruning (removing weights/neurons), distillation (transferring knowledge to smaller models), and low-rank approximation. It highlights papers and libraries that implement these methods, often with a focus on efficiency for mobile and edge devices, as well as recent advancements in LLM compression.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
This is a curated list of research and tools, not a runnable software package. Users must independently evaluate and integrate the referenced papers and libraries. Some advanced techniques may require specific hardware (e.g., GPUs) or significant computational resources for implementation.
10 months ago
Inactive