QuaRot by spcl

Code for a NeurIPS 2024 research paper on LLM quantization

Created 1 year ago

471 stars

Top 64.6% on SourcePulse

View on GitHub

2 Experts Love This Project

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Jeremy Howard

Cofounder of fast.ai

Project Summary

QuaRot introduces an end-to-end 4-bit quantization scheme for Large Language Models (LLMs), targeting researchers and engineers seeking to reduce model size and inference costs. By rotating LLMs to remove outliers in hidden states and activations without altering outputs, QuaRot enables all matrix multiplications to operate at 4-bit precision, eliminating the need for higher-precision channels.

How It Works

QuaRot employs a novel rotation-based quantization approach. It mathematically transforms the LLM's hidden states and activations, effectively pushing outlier values out of the representable range of 4-bit integers. This computational invariance allows for aggressive quantization across all model components, including weights, activations, and the KV cache, leading to significant memory and computational savings.

Quick Start & Requirements

Install by cloning the repository and running pip install -e . or pip install ..
Requires a C++ compiler for kernel compilation.
Official documentation and citation details are available in the repository.

Highlighted Details

Achieves 4-bit end-to-end quantization for LLMs, including weights, activations, and KV cache.
Demonstrates minimal performance degradation: LLaMa2-70B quantized model shows losses of at most 0.29 WikiText perplexity and retains 99% of zero-shot performance.
Addresses outlier issues in quantization by rotating hidden states and activations.

Maintenance & Community

The project is associated with the NeurIPS 2024 paper "QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs." Further community or maintenance details are not specified in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README.

Limitations & Caveats

The README does not detail specific limitations, unsupported platforms, or known bugs. The project appears to be research-oriented, and its readiness for production deployment may require further evaluation.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days