Discover and explore top open-source AI tools and projects—updated daily.
dropboxModel quantizer for fast, accurate post-training quantization, skipping calibration
Top 40.7% on SourcePulse
HQQ (Half-Quadratic Quantization) is a fast, calibration-free quantization library for large machine learning models, supporting 1-8 bits. It enables efficient quantization of LLMs and vision models, significantly reducing VRAM usage and accelerating inference with minimal accuracy loss.
How It Works
HQQ employs a novel quantization approach that avoids the need for calibration data, drastically speeding up the quantization process. It quantizes weights into groups, offering flexibility with an axis parameter for grouping (0 or 1). The dequantization step is a linear operation, allowing seamless integration with optimized CUDA/Triton kernels and torch.compile for enhanced performance.
Quick Start & Requirements
pip install hqq or pip install git+https://github.com/mobiusml/hqq.gittorch.nn.Linear with HQQLinear and configure with BaseQuantizeConfig.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
axis=1.axis=0.1 week ago
1 day
fpgaminer
Cornell-RelaxML
huggingface
IST-DASLab
Vahe1994
vllm-project
intel
AutoGPTQ