SINQ by huawei-csl

Fast, high-quality quantization for making Large Language Models smaller

Created 9 months ago

625 stars

Top 52.1% on SourcePulse

Project Summary

Summary

SINQ (Sinkhorn-Normalized Quantization) is a novel, calibration-free technique for drastically reducing Large Language Model (LLM) memory footprints without accuracy loss. It enables deploying large models on resource-constrained hardware via a fast, plug-and-play compression solution.

How It Works

SINQ utilizes "dual-scaling" with separate row and column scale factors, mitigating outlier vulnerability common in single-scale methods. Its Sinkhorn-normalized optimization iteratively balances variance, leading to more even error distribution and stable quantization, even at low bit-widths (e.g., 3-bit). The approach is model-agnostic and training-free.

Quick Start & Requirements

Installation involves cloning the repository, installing dependencies (pip install -r req.txt), and then the package (pip install .). Requirements include PyTorch and CUDA-enabled hardware. Quantization is primarily via Python API calls.

Repo: https://github.com/huawei-csl/SINQ.git
Paper: http://arxiv.org/abs/2509.22944

Highlighted Details

Speed: ~2x faster than HQQ, >31x faster than AWQ/GPTQ.
Quality: Higher accuracy compared to HQQ, AWQ, GPTQ.
Flexibility: Offers calibration-free (SINQ) and calibrated (A-SINQ) versions, supporting NF4 and 2-8 bit quantization.
Efficiency: Enables running DeepSeekV2.5-236B on ~110 GB memory (vs ~472 GB) with minimal perplexity loss.
Compatibility: Integrates with the lm-eval framework.

Maintenance & Community

Active development is ongoing, with planned Hugging Face Transformers integration and pre-quantized model releases. No specific community channels or external sponsorships are detailed.

Licensing & Compatibility

The specific open-source license for SINQ is not explicitly stated in the provided README. Users must consult the repository's LICENSE file for usage restrictions, especially for commercial applications.

Limitations & Caveats

The project is under active development with "coming soon" features. The absence of a clearly stated license is a significant adoption blocker. Paper results were obtained using a different evaluation framework than the integrated lm-eval, requiring careful consideration for direct benchmarking comparisons.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days