Awesome-LLM-Quantization  by pprp

Curated list of resources for LLM quantization research

Created 1 year ago
303 stars

Top 88.2% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository is a curated list of resources focused on Large Language Model (LLM) quantization techniques. It aims to provide researchers and engineers with a comprehensive overview of papers, methods, and tools for reducing LLM size and computational requirements, enabling deployment on resource-constrained devices.

How It Works

The list primarily categorizes and summarizes key research papers on LLM quantization. It highlights techniques like GPTQ (post-training quantization using Hessian information for 3-4 bit weights), SmoothQuant (migrating quantization difficulty from activations to weights via a scaling factor transformation for W8A8), and AWQ (activation-aware weight quantization protecting salient weights for W4A16). These methods offer advantages by reducing model footprint and computational load while aiming to preserve accuracy.

Highlighted Details

  • GPTQ: A one-shot post-training quantization method reducing bit-width to 3-4 bits per weight, with experiments on 2-bit and ternary quantization.
  • SmoothQuant: A post-training quantization framework targeting W8A8 by migrating quantization difficulty from activations to weights using a mathematically equivalent transformation.
  • AWQ: A low-bit weight-only quantization method (W4A16) that protects salient weights by analyzing activation data to determine optimal per-channel scaling factors.
  • OWQ: Outlier-aware quantization for efficient fine-tuning and inference of LLMs.

Maintenance & Community

Contributions are welcomed via issues or pull requests for new resources, broken links, or outdated information.

Licensing & Compatibility

This repository is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The list focuses on research papers and techniques; it does not provide direct code implementations or pre-quantized models. Information may become outdated as the field rapidly evolves.

Health Check
Last Commit

16 hours ago

Responsiveness

Inactive

Pull Requests (30d)
18
Issues (30d)
0
Star History
32 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Zack Li Zack Li(Cofounder of Nexa AI), and
4 more.

smoothquant by mit-han-lab

0.3%
2k
Post-training quantization research paper for large language models
Created 2 years ago
Updated 1 year ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

gptq by IST-DASLab

0.1%
2k
Code for GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers
Created 2 years ago
Updated 1 year ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

llm-awq by mit-han-lab

0.3%
3k
Weight quantization research paper for LLM compression/acceleration
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.