Quantization algorithm for LLMs and VLMs
Top 58.3% on sourcepulse
AutoRound is an advanced quantization algorithm designed to significantly reduce the memory footprint and computational cost of Large Language Models (LLMs) and Vision-Language Models (VLMs), enabling efficient inference across diverse hardware. It targets researchers and engineers seeking to deploy large models on resource-constrained environments while maintaining high accuracy, even at 2-bit precision.
How It Works
AutoRound employs a novel sign gradient descent method to fine-tune both rounding values and min-max clipping thresholds. This approach allows for rapid convergence, typically within 200 steps, to achieve state-of-the-art accuracy. The algorithm supports mixed-bit tuning, LM-head quantization, and export to popular formats like GPTQ, AWQ, and GGUF, offering flexibility in deployment.
Quick Start & Requirements
pip install auto-round
(GPU), pip install auto-round[cpu]
(CPU), pip install auto-round-lib
(HPU).auto-round -h
) or Python API.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 day ago
1 day