LightCompress  by ModelTC

PyTorch toolkit for LLM quantization research and deployment

created 1 year ago
526 stars

Top 60.9% on sourcepulse

GitHubView on GitHub
Project Summary

LLMC is a PyTorch-based toolkit for compressing Large Language Models (LLMs), offering a versatile solution for enhancing efficiency and reducing model size. It targets researchers and engineers working with LLMs who need to optimize performance and memory footprint without significant accuracy degradation. The toolkit provides state-of-the-art compression algorithms and best practices for LLM post-training quantization.

How It Works

LLMC implements a modular approach to LLM compression, supporting a wide array of quantization and sparsity techniques. It integrates advanced algorithms like AWQ, GPTQ, SmoothQuant, and QuaRot, alongside integer and floating-point quantization methods. The toolkit focuses on maintaining accuracy comparable to original models and provides best practices for optimal performance and efficiency, enabling users to achieve a balance between compression and accuracy.

Quick Start & Requirements

  • Install: Docker images are available on Docker Hub (llmcompression/llmc:pure-latest) and Alibaba Cloud (registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-latest).
  • Prerequisites: PyTorch, specific Python versions (implied by dependencies), and potentially CUDA for GPU acceleration.
  • Resources: Supports quantization of large models like Llama3.1-405B and DeepSeek-R1-671B on a single A100/H100/H800 GPU.
  • Docs: English, Chinese

Highlighted Details

  • Supports quantization of large-scale MOE models (e.g., DeepSeekv3, DeepSeek-R1) and VLM models (e.g., Qwen2VL, Llama3.2).
  • Enables exporting real quantized models (INT4/INT8) compatible with backends like VLLM, SGLang, AutoAWQ, and MLC-LLM.
  • Offers comprehensive support for various quantization algorithms (e.g., Naive, AWQ, GPTQ, SmoothQuant, QuaRot) and pruning methods (e.g., Wanda, Naive).
  • Includes evaluation capabilities via lm-evaluation-harness and integration with OpenCompass.

Maintenance & Community

  • Active development with frequent updates, including support for new models and techniques.
  • Community channels available via Discord and Tencent QQ Group (526192592).

Licensing & Compatibility

  • Licensed under Apache 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

  • While supporting many models, users may need to refer to llmc/models/*.py for custom model integration. Specific hardware requirements for certain operations (e.g., FP8 weights) are implied.
Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
29
Issues (30d)
2
Star History
62 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
6 more.

AutoGPTQ by AutoGPTQ

0.1%
5k
LLM quantization package using GPTQ algorithm
created 2 years ago
updated 3 months ago
Feedback? Help us improve.