Discover and explore top open-source AI tools and projects—updated daily.
XIANGLONGYANLLM compression via advanced binarization
Top 29.9% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> ARB-LLM addresses the high memory and computational demands of Large Language Models (LLMs) by introducing a novel 1-bit post-training quantization (PTQ) technique. Targeting researchers and engineers deploying LLMs, it significantly reduces model size and resource requirements while preserving performance, offering a practical solution for efficient LLM deployment.
How It Works
ARB-LLM employs an Alternating Refined Binarization (ARB) algorithm to progressively update binarization parameters, effectively narrowing the distribution gap between binarized and full-precision weights and minimizing quantization error. Extensions like ARB-X and ARB-RC address specific LLM weight distribution characteristics, such as column deviation, further enhanced by a Column-Group Bitmap (CGB) strategy for refined weight partitioning. This approach yields superior compression and accuracy compared to prior methods.
Quick Start & Requirements
Clone the repository: git clone https://github.com/ZHITENGLI/ARB-LLM.git. Set up a Conda environment (conda create -n arbllm python=3.11, conda activate arbllm) and install dependencies (pip install torch torchvision torchaudio, pip install -r requirements.txt). GPU acceleration (CUDA) is required, as indicated by example commands using "cuda:0". Official repository: https://github.com/ZHITENGLI/ARB-LLM.git.
Highlighted Details
Maintenance & Community
Code released February 16, 2025, indicating recent activity. No explicit community channels (e.g., Discord, Slack) or roadmap links are provided in the README. The project is based on BiLLM.
Licensing & Compatibility
Released under the Apache 2.0 license. Compatibility notes may apply due to its foundation on BiLLM, which should be independently reviewed.
Limitations & Caveats
The provided README does not explicitly detail limitations, unsupported platforms, or known bugs. As a post-training quantization method, its performance characteristics may differ from quantization-aware training approaches. CUDA is a prerequisite.
1 week ago
Inactive
Vahe1994
Cornell-RelaxML
IST-DASLab
AnswerDotAI
mit-han-lab
qwopqwop200