Code for LLM pre-training acceleration via structured pruning (ICLR 2024)
Top 53.7% on sourcepulse
This repository provides the codebase for Sheared-LLaMA, a structured pruning technique that significantly accelerates language model pre-training by creating smaller, performant models from larger ones. It targets researchers and practitioners aiming to develop efficient, smaller-scale LLMs without the prohibitive cost of training from scratch.
How It Works
Sheared-LLaMA leverages MosaicML's Composer package, implementing pruning and dynamic data loading as callbacks. The core idea is to prune existing large models (like LLaMA-2) to a target smaller architecture, achieving performance comparable to models trained from scratch but at a fraction of the cost. This approach integrates pruning directly into the training loop, allowing for efficient mask learning and model compression.
Quick Start & Requirements
pip install -r requirement.txt
and pip install -e .
after installing PyTorch with CUDA 11.8 and Flash Attention 1.0.3.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
autoresume
compatibility is not guaranteed for the pruning stage.1 year ago
1 week