OneCompression  by FujitsuResearch

LLM compression toolkit for enhanced efficiency and accuracy

Created 1 month ago
377 stars

Top 75.2% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Fujitsu One Compression (OneComp) is a Python package for LLM compression, offering advanced quantization methods like QEP and AutoBit to reduce model size and memory footprint while improving accuracy. It targets researchers and engineers, enabling efficient deployment and fine-tuning of compressed LLMs, with seamless integration for serving frameworks like vLLM.

How It Works

OneComp employs novel compression strategies. Quantization Error Propagation (QEP) corrects quantization errors by propagating them to subsequent layers, enhancing accuracy post-training. AutoBit uses ILP for automatic mixed-precision bitwidth assignment, optimizing per-layer bit allocation within a VRAM budget. JointQ jointly optimizes weights and scale parameters for group-wise quantization accuracy. Rotation Preprocessing learns optimal rotation matrices to reduce quantization error, and LoRA SFT Post-Process allows fine-tuning of quantized models.

Quick Start & Requirements

Users: Install PyTorch (CPU/CUDA), then pip install onecomp. Developers (uv): Install uv, clone repo, cd OneCompression, uv sync --extra cu128 --extra dev --extra visualize (adjust CUDA). Pip developers: clone, install PyTorch w/ CUDA, pip install -e ".[dev]". Prerequisites: PyTorch, CUDA (11.8-12.8). Docs: https://FujitsuResearch.github.io/OneCompression/.

Highlighted Details

  • vLLM Integration: Serve OneComp-quantized models with vLLM via built-in
Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
7
Issues (30d)
3
Star History
50 stars in the last 30 days

Explore Similar Projects

Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

llm-awq by mit-han-lab

0.3%
4k
Weight quantization research paper for LLM compression/acceleration
Created 3 years ago
Updated 10 months ago
Feedback? Help us improve.