OneCompression by FujitsuResearch

LLM compression toolkit for enhanced efficiency and accuracy

Created 3 months ago

397 stars

Top 72.3% on SourcePulse

Project Summary

Summary

Fujitsu One Compression (OneComp) is a Python package for LLM compression, offering advanced quantization methods like QEP and AutoBit to reduce model size and memory footprint while improving accuracy. It targets researchers and engineers, enabling efficient deployment and fine-tuning of compressed LLMs, with seamless integration for serving frameworks like vLLM.

How It Works

OneComp employs novel compression strategies. Quantization Error Propagation (QEP) corrects quantization errors by propagating them to subsequent layers, enhancing accuracy post-training. AutoBit uses ILP for automatic mixed-precision bitwidth assignment, optimizing per-layer bit allocation within a VRAM budget. JointQ jointly optimizes weights and scale parameters for group-wise quantization accuracy. Rotation Preprocessing learns optimal rotation matrices to reduce quantization error, and LoRA SFT Post-Process allows fine-tuning of quantized models.

Quick Start & Requirements

Users: Install PyTorch (CPU/CUDA), then pip install onecomp. Developers (uv): Install uv, clone repo, cd OneCompression, uv sync --extra cu128 --extra dev --extra visualize (adjust CUDA). Pip developers: clone, install PyTorch w/ CUDA, pip install -e ".[dev]". Prerequisites: PyTorch, CUDA (11.8-12.8). Docs: https://FujitsuResearch.github.io/OneCompression/.

OneCompression by FujitsuResearch

Explore Similar Projects

LLM-QAT by facebookresearch

Atom by efeslab

EfficientQAT by OpenGVLab

SqueezeLLM by SqueezeAILab

VPTQ by microsoft

quip-sharp by Cornell-RelaxML

optimum-quanto by huggingface

deepcompressor by nunchaku-ai

exllamav3 by turboderp-org

gptq by IST-DASLab

Awesome-Model-Quantization by AI-Efficiency

llm-awq by mit-han-lab