Awesome-LLM-Quantization by pprp

Curated list of resources for LLM quantization research

Created 1 year ago

377 stars

Top 75.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Alex Chen

Cofounder of Nexa AI

Project Summary

This repository is a curated list of resources focused on Large Language Model (LLM) quantization techniques. It aims to provide researchers and engineers with a comprehensive overview of papers, methods, and tools for reducing LLM size and computational requirements, enabling deployment on resource-constrained devices.

How It Works

The list primarily categorizes and summarizes key research papers on LLM quantization. It highlights techniques like GPTQ (post-training quantization using Hessian information for 3-4 bit weights), SmoothQuant (migrating quantization difficulty from activations to weights via a scaling factor transformation for W8A8), and AWQ (activation-aware weight quantization protecting salient weights for W4A16). These methods offer advantages by reducing model footprint and computational load while aiming to preserve accuracy.

Highlighted Details

GPTQ: A one-shot post-training quantization method reducing bit-width to 3-4 bits per weight, with experiments on 2-bit and ternary quantization.
SmoothQuant: A post-training quantization framework targeting W8A8 by migrating quantization difficulty from activations to weights using a mathematically equivalent transformation.
AWQ: A low-bit weight-only quantization method (W4A16) that protects salient weights by analyzing activation data to determine optimal per-channel scaling factors.
OWQ: Outlier-aware quantization for efficient fine-tuning and inference of LLMs.

Maintenance & Community

Contributions are welcomed via issues or pull requests for new resources, broken links, or outdated information.

Licensing & Compatibility

This repository is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The list focuses on research papers and techniques; it does not provide direct code implementations or pre-quantized models. Information may become outdated as the field rapidly evolves.

Awesome-LLM-Quantization by pprp

Explore Similar Projects

EfficientQAT by OpenGVLab

LLM-QAT by facebookresearch

Atom by efeslab

VPTQ by microsoft

quip-sharp by Cornell-RelaxML

OmniQuant by OpenGVLab

optimum-quanto by huggingface

deepcompressor by nunchaku-tech

gptq by IST-DASLab

smoothquant by mit-han-lab

Awesome-Model-Quantization by Efficient-ML

llm-awq by mit-han-lab