Model compression toolbox for LLMs and diffusion models
Top 57.8% on sourcepulse
DeepCompressor is a PyTorch-based toolbox for compressing Large Language Models (LLMs) and Diffusion Models, targeting researchers and engineers aiming to deploy these models efficiently. It offers advanced quantization techniques, including 4-bit and 8-bit precision for weights and activations, significantly reducing memory footprint and latency while preserving model accuracy.
How It Works
The toolbox implements state-of-the-art quantization algorithms like AWQ, GPTQ, SmoothQuant, and its novel contributions: QoQ (W4A8KV4 for LLMs) and SVDQuant (W4A4 for diffusion models). QoQ addresses overheads in low-bit LLM serving by optimizing dequantization and KV cache handling, while SVDQuant tackles aggressive 4-bit quantization in diffusion models by absorbing outliers via low-rank components and a fused inference engine (Nunchaku) for efficiency.
Quick Start & Requirements
poetry install
after creating a conda environment (conda env create -f environment.yml
).Highlighted Details
Maintenance & Community
The project is associated with MIT HAN Lab, known for efficient generative AI research. Related projects have garnered significant attention (9k+ stars, 1M+ Huggingface downloads).
Licensing & Compatibility
The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification.
Limitations & Caveats
The README does not specify the exact license, which could impact commercial adoption. While extensive benchmarks are provided, specific hardware requirements beyond GPU acceleration are not detailed.
4 months ago
1+ week