Paper list for neural network quantization research
Top 51.0% on sourcepulse
This repository serves as a curated and actively updated catalog of research papers on neural network quantization, focusing on techniques for efficient deep learning inference. It targets researchers, engineers, and practitioners in AI and machine learning who need to stay abreast of the latest advancements in model compression and optimization. The primary benefit is a structured overview of quantization methods, categorized by model architecture and application, facilitating targeted research and development.
How It Works
The repository organizes papers by conference (e.g., ICLR, NeurIPS, CVPR) and model type (e.g., Transformers, CNNs, Diffusion Models, Vision Transformers). Each entry includes a link to the paper and often keywords indicating the quantization approach (e.g., PTQ for Post-Training Quantization, Extreme for binary/ternary quantization, MP for mixed-precision). This structured approach allows users to quickly identify relevant research and understand the landscape of quantization techniques.
Quick Start & Requirements
This is a curated list of papers; no installation or execution is required. The content is accessible via the GitHub repository.
Highlighted Details
Maintenance & Community
The repository is actively maintained and welcomes contributions to expand its scope. It acknowledges collaborators and encourages community engagement through starring and sharing.
Licensing & Compatibility
The repository itself is typically licensed under permissive terms (e.g., MIT, Apache 2.0) allowing broad use and contribution. The linked papers are subject to their respective publication licenses.
Limitations & Caveats
This repository is a bibliography and does not provide code implementations or benchmarks for the papers listed. Users must refer to individual papers for implementation details and performance validation.
4 months ago
Inactive