Awesome-LLM-Quantization  by pprp

Curated list of resources for LLM quantization research

created 1 year ago
258 stars

Top 98.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository is a curated list of resources focused on Large Language Model (LLM) quantization techniques. It aims to provide researchers and engineers with a comprehensive overview of papers, methods, and tools for reducing LLM size and computational requirements, enabling deployment on resource-constrained devices.

How It Works

The list primarily categorizes and summarizes key research papers on LLM quantization. It highlights techniques like GPTQ (post-training quantization using Hessian information for 3-4 bit weights), SmoothQuant (migrating quantization difficulty from activations to weights via a scaling factor transformation for W8A8), and AWQ (activation-aware weight quantization protecting salient weights for W4A16). These methods offer advantages by reducing model footprint and computational load while aiming to preserve accuracy.

Highlighted Details

  • GPTQ: A one-shot post-training quantization method reducing bit-width to 3-4 bits per weight, with experiments on 2-bit and ternary quantization.
  • SmoothQuant: A post-training quantization framework targeting W8A8 by migrating quantization difficulty from activations to weights using a mathematically equivalent transformation.
  • AWQ: A low-bit weight-only quantization method (W4A16) that protects salient weights by analyzing activation data to determine optimal per-channel scaling factors.
  • OWQ: Outlier-aware quantization for efficient fine-tuning and inference of LLMs.

Maintenance & Community

Contributions are welcomed via issues or pull requests for new resources, broken links, or outdated information.

Licensing & Compatibility

This repository is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The list focuses on research papers and techniques; it does not provide direct code implementations or pre-quantized models. Information may become outdated as the field rapidly evolves.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
53 stars in the last 90 days

Explore Similar Projects

Starred by Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
1 more.

AQLM by Vahe1994

0.1%
1k
PyTorch code for LLM compression via Additive Quantization (AQLM)
created 1 year ago
updated 2 months ago
Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley) and Yang Song Yang Song(Professor at Caltech; Research Scientist at OpenAI).

vector-quantize-pytorch by lucidrains

0.4%
3k
PyTorch library for vector quantization techniques
created 5 years ago
updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
2 more.

GPTQ-for-LLaMa by qwopqwop200

0.0%
3k
4-bit quantization for LLaMA models using GPTQ
created 2 years ago
updated 1 year ago
Feedback? Help us improve.