Awesome-Quantization-Papers  by Zhen-Dong

Paper list for neural network quantization research

Created 3 years ago
717 stars

Top 48.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a curated and actively updated catalog of research papers on neural network quantization, focusing on techniques for efficient deep learning inference. It targets researchers, engineers, and practitioners in AI and machine learning who need to stay abreast of the latest advancements in model compression and optimization. The primary benefit is a structured overview of quantization methods, categorized by model architecture and application, facilitating targeted research and development.

How It Works

The repository organizes papers by conference (e.g., ICLR, NeurIPS, CVPR) and model type (e.g., Transformers, CNNs, Diffusion Models, Vision Transformers). Each entry includes a link to the paper and often keywords indicating the quantization approach (e.g., PTQ for Post-Training Quantization, Extreme for binary/ternary quantization, MP for mixed-precision). This structured approach allows users to quickly identify relevant research and understand the landscape of quantization techniques.

Quick Start & Requirements

This is a curated list of papers; no installation or execution is required. The content is accessible via the GitHub repository.

Highlighted Details

  • Comprehensive coverage of recent AI conferences and arXiv preprints.
  • Categorization by model architecture (LLMs, ViTs, CNNs, Diffusion Models) and task (Image Classification, Object Detection, Super Resolution).
  • Keywords and labels (PTQ, Non-uniform, Extreme, MP) to quickly identify quantization methods.
  • Active updates, with recent additions from ICLR-25, ECCV-24, NeurIPS-24, ICML-24, and CVPR-24.

Maintenance & Community

The repository is actively maintained and welcomes contributions to expand its scope. It acknowledges collaborators and encourages community engagement through starring and sharing.

Licensing & Compatibility

The repository itself is typically licensed under permissive terms (e.g., MIT, Apache 2.0) allowing broad use and contribution. The linked papers are subject to their respective publication licenses.

Limitations & Caveats

This repository is a bibliography and does not provide code implementations or benchmarks for the papers listed. Users must refer to individual papers for implementation details and performance validation.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
31 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Zack Li Zack Li(Cofounder of Nexa AI), and
4 more.

smoothquant by mit-han-lab

0.3%
2k
Post-training quantization research paper for large language models
Created 2 years ago
Updated 1 year ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

gptq by IST-DASLab

0.1%
2k
Code for GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers
Created 2 years ago
Updated 1 year ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

neural-compressor by intel

0.2%
2k
Python library for model compression (quantization, pruning, distillation, NAS)
Created 5 years ago
Updated 16 hours ago
Feedback? Help us improve.