AngelSlim by Tencent

Model compression toolkit for efficient AI

Created 1 year ago

1,389 stars

Top 28.4% on SourcePulse

Project Summary

Model compression is addressed by AngelSlim, a toolkit engineered for enhanced usability, comprehensiveness, and efficiency, targeting engineers and researchers working with large AI models. It provides a unified framework for applying various compression techniques, enabling more accessible and performant model deployment. The toolkit aims to streamline the model compression workflow, making advanced techniques readily available.

How It Works

The toolkit integrates mainstream compression algorithms, including quantization (e.g., FP8, INT4, NVFP4, Tequila) and speculative decoding (Eagle3), into a unified, user-friendly framework. It focuses on performance optimization across the end-to-end compression workflow, from training to deployment. AngelSlim continuously researches and incorporates novel compression algorithms, offering a path to significantly reduce model size and inference costs while maintaining accuracy.

Quick Start & Requirements

Primary install: pip install angelslim or clone and python setup.py install.
Prerequisites: GPU acceleration is essential for performance; specific CUDA versions are not detailed but implied. Python 3 environment required.
Links: 📖 Documentation, 🤗 Hugging Face, 🤖 ModelScope, 💬 WeChat, 🫨 Discord.

Highlighted Details

Supports a broad spectrum of models including Large Language Models (LLMs), Vision Language Models (VLMs), Diffusion Models, and Speech Models from various providers like Tencent, Qwen, and DeepSeek.
Offers a comprehensive suite of compression techniques, featuring advanced quantization algorithms (FP8-Static/Dynamic, INT4-GPTQ/AWQ, NVFP4, Tequila) and speculative decoding (Eagle3) with early-exit mechanisms (SpecExit).
Demonstrates significant performance gains, with Eagle3 speculative decoding achieving up to 1.9x speedup and improved accept lengths, while quantization methods like FP8 and INT4 show minimal accuracy degradation.
Enables efficient deployment of large models, such as Qwen3-235B, on single GPUs through optimized quantization and speculative decoding frameworks.

Maintenance & Community

The project shows active development with frequent releases (e.g., v0.3, v0.2) and ongoing additions of new models and algorithms. Community engagement is facilitated through WeChat, Discord, and GitHub Issues for discussions and support.

Licensing & Compatibility

The code is stated to be open-sourced under "License for AngelSlim." The specific terms and compatibility for commercial use or closed-source linking are not detailed, requiring further clarification.

Limitations & Caveats

Some advanced features, such as token pruning for VLMs and audio models, are listed as "Under Development." The absence of a clearly defined, standard open-source license may pose adoption challenges for certain use cases.

Health Check

Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

96 stars in the last 30 days