OneCompression  by FujitsuResearch

LLM compression toolkit for enhanced efficiency and accuracy

Created 1 week ago

New!

305 stars

Top 87.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Fujitsu One Compression (OneComp) is a Python package for LLM compression, offering advanced quantization methods like QEP and AutoBit to reduce model size and memory footprint while improving accuracy. It targets researchers and engineers, enabling efficient deployment and fine-tuning of compressed LLMs, with seamless integration for serving frameworks like vLLM.

How It Works

OneComp employs novel compression strategies. Quantization Error Propagation (QEP) corrects quantization errors by propagating them to subsequent layers, enhancing accuracy post-training. AutoBit uses ILP for automatic mixed-precision bitwidth assignment, optimizing per-layer bit allocation within a VRAM budget. JointQ jointly optimizes weights and scale parameters for group-wise quantization accuracy. Rotation Preprocessing learns optimal rotation matrices to reduce quantization error, and LoRA SFT Post-Process allows fine-tuning of quantized models.

Quick Start & Requirements

Users: Install PyTorch (CPU/CUDA), then pip install onecomp. Developers (uv): Install uv, clone repo, cd OneCompression, uv sync --extra cu128 --extra dev --extra visualize (adjust CUDA). Pip developers: clone, install PyTorch w/ CUDA, pip install -e ".[dev]". Prerequisites: PyTorch, CUDA (11.8-12.8). Docs: https://FujitsuResearch.github.io/OneCompression/.

Highlighted Details

  • vLLM Integration: Serve OneComp-quantized models with vLLM via built-in
Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
7
Issues (30d)
0
Star History
306 stars in the last 12 days

Explore Similar Projects

Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

llm-awq by mit-han-lab

0.4%
3k
Weight quantization research paper for LLM compression/acceleration
Created 2 years ago
Updated 8 months ago
Feedback? Help us improve.