Discover and explore top open-source AI tools and projects—updated daily.
Vahe1994PyTorch code for LLM compression via Additive Quantization (AQLM)
Top 30.7% on SourcePulse
This repository provides the official PyTorch implementation for AQLM (Additive Quantization) and PV-Tuning, techniques for extreme compression of Large Language Models (LLMs). It enables significant reductions in model size and memory footprint while maintaining high accuracy, targeting researchers and practitioners working with LLMs who need to deploy them efficiently.
How It Works
AQLM achieves extreme compression by quantizing LLM weights using additive quantization, which decomposes weights into sums of vectors from learned codebooks. PV-Tuning further enhances this by introducing a novel finetuning algorithm that improves accuracy over traditional methods like Straight-Through Estimation. This approach allows for highly compressed models with minimal performance degradation.
Quick Start & Requirements
pip install aqlm[gpu,cpu]>=1.1.6Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 months ago
1 day
Vahe1994
Cornell-RelaxML
huggingface
vllm-project