Discover and explore top open-source AI tools and projects—updated daily.
FujitsuResearchLLM compression toolkit for enhanced efficiency and accuracy
New!
Top 87.8% on SourcePulse
Summary
Fujitsu One Compression (OneComp) is a Python package for LLM compression, offering advanced quantization methods like QEP and AutoBit to reduce model size and memory footprint while improving accuracy. It targets researchers and engineers, enabling efficient deployment and fine-tuning of compressed LLMs, with seamless integration for serving frameworks like vLLM.
How It Works
OneComp employs novel compression strategies. Quantization Error Propagation (QEP) corrects quantization errors by propagating them to subsequent layers, enhancing accuracy post-training. AutoBit uses ILP for automatic mixed-precision bitwidth assignment, optimizing per-layer bit allocation within a VRAM budget. JointQ jointly optimizes weights and scale parameters for group-wise quantization accuracy. Rotation Preprocessing learns optimal rotation matrices to reduce quantization error, and LoRA SFT Post-Process allows fine-tuning of quantized models.
Quick Start & Requirements
Users: Install PyTorch (CPU/CUDA), then pip install onecomp. Developers (uv): Install uv, clone repo, cd OneCompression, uv sync --extra cu128 --extra dev --extra visualize (adjust CUDA). Pip developers: clone, install PyTorch w/ CUDA, pip install -e ".[dev]". Prerequisites: PyTorch, CUDA (11.8-12.8). Docs: https://FujitsuResearch.github.io/OneCompression/.
Highlighted Details
1 week ago
Inactive
Cornell-RelaxML
huggingface
IST-DASLab
mit-han-lab