LLM structural pruner for model compression
Top 36.5% on sourcepulse
LLM-Pruner offers structural pruning for large language models, enabling significant compression with minimal performance degradation. It targets researchers and practitioners aiming to reduce the computational footprint of LLMs like Llama, BLOOM, and Vicuna, facilitating deployment on resource-constrained environments.
How It Works
LLM-Pruner employs a three-stage process: Discovery, Estimation, and Recovery. The Discovery stage identifies minimally-removable structural units within the LLM. The Estimation stage quantizes the importance of these units using criteria like Taylor expansion or L1/L2 norms. Finally, the Recovery stage uses efficient post-training on datasets like Alpaca or LaMini-instruction to restore model performance. This approach allows for task-agnostic compression and efficient fine-tuning.
Quick Start & Requirements
pip install -r requirement.txt
.lm-evaluation-harness
. GPU is recommended for Taylor-based pruning and evaluation.script/llama_prune.sh
script automates downloading models and datasets for a minimal example.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project notes that while efficient, the compressed models can still exhibit issues like repetitive token generation or nonsensical outputs, indicating room for quality improvement. Manual intervention may be required for certain model architectures to map index concatenations.
10 months ago
1 day