PyTorch toolkit for LLM quantization research and deployment
Top 60.9% on sourcepulse
LLMC is a PyTorch-based toolkit for compressing Large Language Models (LLMs), offering a versatile solution for enhancing efficiency and reducing model size. It targets researchers and engineers working with LLMs who need to optimize performance and memory footprint without significant accuracy degradation. The toolkit provides state-of-the-art compression algorithms and best practices for LLM post-training quantization.
How It Works
LLMC implements a modular approach to LLM compression, supporting a wide array of quantization and sparsity techniques. It integrates advanced algorithms like AWQ, GPTQ, SmoothQuant, and QuaRot, alongside integer and floating-point quantization methods. The toolkit focuses on maintaining accuracy comparable to original models and provides best practices for optimal performance and efficiency, enabling users to achieve a balance between compression and accuracy.
Quick Start & Requirements
llmcompression/llmc:pure-latest
) and Alibaba Cloud (registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-latest
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
llmc/models/*.py
for custom model integration. Specific hardware requirements for certain operations (e.g., FP8 weights) are implied.1 day ago
1 day