Discover and explore top open-source AI tools and projects—updated daily.
intelQuantization algorithm for LLMs and VLMs
Top 49.3% on SourcePulse
AutoRound is an advanced quantization algorithm designed to significantly reduce the memory footprint and computational cost of Large Language Models (LLMs) and Vision-Language Models (VLMs), enabling efficient inference across diverse hardware. It targets researchers and engineers seeking to deploy large models on resource-constrained environments while maintaining high accuracy, even at 2-bit precision.
How It Works
AutoRound employs a novel sign gradient descent method to fine-tune both rounding values and min-max clipping thresholds. This approach allows for rapid convergence, typically within 200 steps, to achieve state-of-the-art accuracy. The algorithm supports mixed-bit tuning, LM-head quantization, and export to popular formats like GPTQ, AWQ, and GGUF, offering flexibility in deployment.
Quick Start & Requirements
pip install auto-round (GPU), pip install auto-round[cpu] (CPU), pip install auto-round-lib (HPU).auto-round -h) or Python API.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
9 hours ago
1 day
dropbox
Vahe1994
vllm-project
mit-han-lab
artidoro