SpinQuant  by facebookresearch

Code for research paper on LLM quantization via learned rotations

Created 1 year ago
327 stars

Top 83.4% on SourcePulse

GitHubView on GitHub
Project Summary

SpinQuant addresses the challenge of reducing the computational and memory footprint of Large Language Models (LLMs) through advanced quantization techniques. It is designed for researchers and engineers working on LLM deployment and optimization, offering a method to achieve significant compression with minimal accuracy loss.

How It Works

SpinQuant introduces learned rotations, specifically utilizing Cayley transforms, to mitigate the impact of outliers in LLM weights and activations during quantization. This approach differs from static or random rotation methods by learning optimal rotation matrices, thereby improving quantization performance and reducing the accuracy gap compared to full-precision models.

Quick Start & Requirements

  • Installation: Clone the repository, install PyTorch with CUDA support, and then install the fast-hadamard-transform package.
    git clone https://github.com/facebookresearch/SpinQuant.git
    cd SpinQuant
    # Install PyTorch with CUDA from https://pytorch.org/get-started/locally/
    pip install -r requirements.txt
    # Install fast-hadamard-transform
    git clone https://github.com/Dao-AILab/fast-hadamard-transform.git
    cd fast-hadamard-transform
    pip install .
    
  • Prerequisites: Python 3.9, PyTorch >= 2.0 with CUDA support.
  • Usage: Scripts are provided for optimizing rotation matrices (10_optimize_rotation.sh, 11_optimize_rotation_fsdp.sh) and evaluating quantized models (2_eval_ptq.sh). Export to ExecuTorch is also supported (31_optimize_rotation_executorch.sh, 32_eval_ptq_executorch.sh).
  • Resources: Requires access to HuggingFace models (via access_token) and potentially large datasets for evaluation.

Highlighted Details

  • Achieves W4A4KV4 quantization with only a 2.9-point accuracy gap for LLaMA-2 7B on zero-shot reasoning.
  • Outperforms LLM-QAT by 19.1 points and SmoothQuant by 25.0 points in specific benchmarks.
  • Supports exporting quantized models to ExecuTorch for real-time speedups.
  • Provides pre-trained quantized models for Llama-3.2 and Llama-2 variants.

Maintenance & Community

The project is from Meta AI (facebookresearch) and is associated with the paper "SpinQuant: LLM Quantization with Learned Rotations." Contact information for Zechun Liu and Changsheng Zhao is provided.

Licensing & Compatibility

The project is licensed under CC-BY-NC 4.0, which restricts commercial use.

Limitations & Caveats

The CC-BY-NC 4.0 license prohibits commercial use. The README notes that results reported in the paper were run with an internal Meta codebase, and the released code is a reproduction using HuggingFace, which may lead to minor discrepancies.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI) and Jeremy Howard Jeremy Howard(Cofounder of fast.ai).

QuaRot by spcl

0.5%
424
Code for a NeurIPS 2024 research paper on LLM quantization
Created 1 year ago
Updated 9 months ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Zack Li Zack Li(Cofounder of Nexa AI), and
4 more.

smoothquant by mit-han-lab

0.3%
2k
Post-training quantization research paper for large language models
Created 2 years ago
Updated 1 year ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

gptq by IST-DASLab

0.1%
2k
Code for GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers
Created 2 years ago
Updated 1 year ago
Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Yang Song Yang Song(Professor at Caltech; Research Scientist at OpenAI), and
1 more.

vector-quantize-pytorch by lucidrains

0.5%
4k
PyTorch library for vector quantization techniques
Created 5 years ago
Updated 2 weeks ago
Feedback? Help us improve.