Discover and explore top open-source AI tools and projects—updated daily.
MAC-AutoMLLLM/LVLM compression and evaluation framework
Top 36.7% on SourcePulse
A powerful unified framework for compressing Large Language Models (LLMs) and Large Vision-Language Models (LVLMs), MindPipe offers a single command-line interface for advanced techniques including post-training quantization, quantization-aware training, and various pruning strategies, alongside comprehensive evaluation capabilities. Targeting researchers and engineers, MindPipe streamlines reproducible experimentation and deployment across diverse hardware, notably NVIDIA GPUs and Huawei Ascend NPUs, by providing a consistent abstraction layer.
How It Works
The framework centers around a unified main.py entrypoint that manages quantization, pruning, and evaluation pipelines. It features a robust device abstraction layer for seamless operation across GPUs and NPUs, handling cache management, synchronization, and dtype policies uniformly. MindPipe integrates a broad spectrum of 11 quantization methods (PTQ/QAT) and 7 pruning techniques, supporting both text-only and multimodal architectures via a shared model adapter. Results are systematically serialized into JSON format for straightforward aggregation and analysis.
Quick Start & Requirements
conda activate mindpipe, git submodule update --init --recursive, and python -m pip install -r requirements.txt.VLMEVALKIT_ROOT environment variable.main.py script serves as the primary interface, configurable via numerous command-line arguments for specific compression tasks and evaluations. Example commands for full-precision evaluation, quantization, and pruning are detailed in the README.Highlighted Details
Maintenance & Community
The provided README does not contain information regarding specific maintainers, community channels (e.g., Discord, Slack), or project sponsorships.
Licensing & Compatibility
The README does not specify the software license or provide details on compatibility for commercial use or integration with closed-source projects.
Limitations & Caveats
Certain algorithms like QuaRot, SpinQuant, and MQuant are not yet marked as NPU-ready. QA-LoRA is a CUDA-only implementation and does not produce AutoGPTQ packed checkpoints. QLoRA's NPU support relies on an in-tree fake-quant fallback mechanism. Model reload functionality after applying custom runtime wrappers is method-dependent.
11 hours ago
Inactive
evanmiller
vllm-project
AutoGPTQ
modelscope