LightCompress by ModelTC

PyTorch toolkit for LLM quantization research and deployment

Created 1 year ago

658 stars

Top 51.1% on SourcePulse

Project Summary

LLMC is a PyTorch-based toolkit for compressing Large Language Models (LLMs), offering a versatile solution for enhancing efficiency and reducing model size. It targets researchers and engineers working with LLMs who need to optimize performance and memory footprint without significant accuracy degradation. The toolkit provides state-of-the-art compression algorithms and best practices for LLM post-training quantization.

How It Works

LLMC implements a modular approach to LLM compression, supporting a wide array of quantization and sparsity techniques. It integrates advanced algorithms like AWQ, GPTQ, SmoothQuant, and QuaRot, alongside integer and floating-point quantization methods. The toolkit focuses on maintaining accuracy comparable to original models and provides best practices for optimal performance and efficiency, enabling users to achieve a balance between compression and accuracy.

Quick Start & Requirements

Install: Docker images are available on Docker Hub (llmcompression/llmc:pure-latest) and Alibaba Cloud (registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-latest).
Prerequisites: PyTorch, specific Python versions (implied by dependencies), and potentially CUDA for GPU acceleration.
Resources: Supports quantization of large models like Llama3.1-405B and DeepSeek-R1-671B on a single A100/H100/H800 GPU.
Docs: English, Chinese

Highlighted Details

Supports quantization of large-scale MOE models (e.g., DeepSeekv3, DeepSeek-R1) and VLM models (e.g., Qwen2VL, Llama3.2).
Enables exporting real quantized models (INT4/INT8) compatible with backends like VLLM, SGLang, AutoAWQ, and MLC-LLM.
Offers comprehensive support for various quantization algorithms (e.g., Naive, AWQ, GPTQ, SmoothQuant, QuaRot) and pruning methods (e.g., Wanda, Naive).
Includes evaluation capabilities via lm-evaluation-harness and integration with OpenCompass.

Maintenance & Community

Active development with frequent updates, including support for new models and techniques.
Community channels available via Discord and Tencent QQ Group (526192592).

Licensing & Compatibility

Licensed under Apache 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While supporting many models, users may need to refer to llmc/models/*.py for custom model integration. Specific hardware requirements for certain operations (e.g., FP8 weights) are implied.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

24 stars in the last 30 days