SINQ  by huawei-csl

Fast, high-quality quantization for making Large Language Models smaller

Created 6 months ago
608 stars

Top 53.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

SINQ (Sinkhorn-Normalized Quantization) is a novel, calibration-free technique for drastically reducing Large Language Model (LLM) memory footprints without accuracy loss. It enables deploying large models on resource-constrained hardware via a fast, plug-and-play compression solution.

How It Works

SINQ utilizes "dual-scaling" with separate row and column scale factors, mitigating outlier vulnerability common in single-scale methods. Its Sinkhorn-normalized optimization iteratively balances variance, leading to more even error distribution and stable quantization, even at low bit-widths (e.g., 3-bit). The approach is model-agnostic and training-free.

Quick Start & Requirements

Installation involves cloning the repository, installing dependencies (pip install -r req.txt), and then the package (pip install .). Requirements include PyTorch and CUDA-enabled hardware. Quantization is primarily via Python API calls.

Highlighted Details

  • Speed: ~2x faster than HQQ, >31x faster than AWQ/GPTQ.
  • Quality: Higher accuracy compared to HQQ, AWQ, GPTQ.
  • Flexibility: Offers calibration-free (SINQ) and calibrated (A-SINQ) versions, supporting NF4 and 2-8 bit quantization.
  • Efficiency: Enables running DeepSeekV2.5-236B on ~110 GB memory (vs ~472 GB) with minimal perplexity loss.
  • Compatibility: Integrates with the lm-eval framework.

Maintenance & Community

Active development is ongoing, with planned Hugging Face Transformers integration and pre-quantized model releases. No specific community channels or external sponsorships are detailed.

Licensing & Compatibility

The specific open-source license for SINQ is not explicitly stated in the provided README. Users must consult the repository's LICENSE file for usage restrictions, especially for commercial applications.

Limitations & Caveats

The project is under active development with "coming soon" features. The absence of a clearly stated license is a significant adoption blocker. Paper results were obtained using a different evaluation framework than the integrated lm-eval, requiring careful consideration for direct benchmarking comparisons.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), Maxime Labonne Maxime Labonne(Head of Post-Training at Liquid AI), and
5 more.

AQLM by Vahe1994

0%
1k
PyTorch code for LLM compression via Additive Quantization (AQLM)
Created 2 years ago
Updated 1 month ago
Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

airllm by lyogavin

2.5%
15k
Inference optimization for LLMs on low-resource hardware
Created 2 years ago
Updated 1 month ago
Feedback? Help us improve.