MindPipe  by MAC-AutoML

LLM/LVLM compression and evaluation framework

Created 4 months ago
1,005 stars

Top 36.7% on SourcePulse

GitHubView on GitHub
Project Summary

A powerful unified framework for compressing Large Language Models (LLMs) and Large Vision-Language Models (LVLMs), MindPipe offers a single command-line interface for advanced techniques including post-training quantization, quantization-aware training, and various pruning strategies, alongside comprehensive evaluation capabilities. Targeting researchers and engineers, MindPipe streamlines reproducible experimentation and deployment across diverse hardware, notably NVIDIA GPUs and Huawei Ascend NPUs, by providing a consistent abstraction layer.

How It Works

The framework centers around a unified main.py entrypoint that manages quantization, pruning, and evaluation pipelines. It features a robust device abstraction layer for seamless operation across GPUs and NPUs, handling cache management, synchronization, and dtype policies uniformly. MindPipe integrates a broad spectrum of 11 quantization methods (PTQ/QAT) and 7 pruning techniques, supporting both text-only and multimodal architectures via a shared model adapter. Results are systematically serialized into JSON format for straightforward aggregation and analysis.

Quick Start & Requirements

  • Installation: Requires conda activate mindpipe, git submodule update --init --recursive, and python -m pip install -r requirements.txt.
  • Prerequisites: NVIDIA GPUs or Huawei Ascend NPUs are essential. VLMEvalKit integration necessitates initializing its submodule or setting the VLMEVALKIT_ROOT environment variable.
  • Usage: The main.py script serves as the primary interface, configurable via numerous command-line arguments for specific compression tasks and evaluations. Example commands for full-precision evaluation, quantization, and pruning are detailed in the README.
  • Links: No external quick-start or demo links are provided; the README is the primary resource.

Highlighted Details

  • Implements 11 quantization methods (e.g., AWQ, GPTQ, FlatQuant, QLoRA) and 7 pruning methods (e.g., Wanda, SparseGPT, LLM-Pruner).
  • Supports a wide range of models, including LLaMA-family, Qwen (text/VL), MiniCPM-V, LLaVA, and InternVL.
  • Features integrated VLMEvalKit for multimodal evaluation, with recent successful validation of AWQ W4A16 on key VLM benchmarks.
  • Ensures reproducibility through shared utilities for model loading, dataset handling, and result serialization.

Maintenance & Community

The provided README does not contain information regarding specific maintainers, community channels (e.g., Discord, Slack), or project sponsorships.

Licensing & Compatibility

The README does not specify the software license or provide details on compatibility for commercial use or integration with closed-source projects.

Limitations & Caveats

Certain algorithms like QuaRot, SpinQuant, and MQuant are not yet marked as NPU-ready. QA-LoRA is a CUDA-only implementation and does not produce AutoGPTQ packed checkpoints. QLoRA's NPU support relies on an in-tree fake-quant fallback mechanism. Model reload functionality after applying custom runtime wrappers is method-dependent.

Health Check
Last Commit

11 hours ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
10
Star History
1,000 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

0%
491
MoE model for research
Created 1 year ago
Updated 9 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Dan Guido Dan Guido(Cofounder of Trail of Bits), and
6 more.

llm-compressor by vllm-project

0.6%
3k
Transformers-compatible library for LLM compression, optimized for vLLM deployment
Created 1 year ago
Updated 17 hours ago
Starred by Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), Eugene Yan Eugene Yan(AI Scientist at AWS), and
1 more.

ms-swift by modelscope

0.6%
14k
SDK for fine-tuning and deploying LLMs/MLLMs
Created 2 years ago
Updated 13 hours ago
Feedback? Help us improve.