Discover and explore top open-source AI tools and projects—updated daily.
huawei-cslFast, high-quality quantization for making Large Language Models smaller
Top 53.8% on SourcePulse
Summary
SINQ (Sinkhorn-Normalized Quantization) is a novel, calibration-free technique for drastically reducing Large Language Model (LLM) memory footprints without accuracy loss. It enables deploying large models on resource-constrained hardware via a fast, plug-and-play compression solution.
How It Works
SINQ utilizes "dual-scaling" with separate row and column scale factors, mitigating outlier vulnerability common in single-scale methods. Its Sinkhorn-normalized optimization iteratively balances variance, leading to more even error distribution and stable quantization, even at low bit-widths (e.g., 3-bit). The approach is model-agnostic and training-free.
Quick Start & Requirements
Installation involves cloning the repository, installing dependencies (pip install -r req.txt), and then the package (pip install .). Requirements include PyTorch and CUDA-enabled hardware. Quantization is primarily via Python API calls.
Highlighted Details
lm-eval framework.Maintenance & Community
Active development is ongoing, with planned Hugging Face Transformers integration and pre-quantized model releases. No specific community channels or external sponsorships are detailed.
Licensing & Compatibility
The specific open-source license for SINQ is not explicitly stated in the provided README. Users must consult the repository's LICENSE file for usage restrictions, especially for commercial applications.
Limitations & Caveats
The project is under active development with "coming soon" features. The absence of a clearly stated license is a significant adoption blocker. Paper results were obtained using a different evaluation framework than the integrated lm-eval, requiring careful consideration for direct benchmarking comparisons.
1 month ago
Inactive
huggingface
Vahe1994
Tiiny-AI
lyogavin