Research paper implementation for memory-efficient LM fine-tuning
Top 34.9% on sourcepulse
MeZO offers a memory-efficient method for fine-tuning large language models (LLMs) by leveraging zeroth-order optimization, enabling training on hardware typically limited to inference. This approach is beneficial for researchers and practitioners with constrained GPU resources who need to adapt LLMs for specific tasks.
How It Works
MeZO adapts classical zeroth-order stochastic gradient descent (SGD) to operate in-place, eliminating the need for backpropagation and its associated memory overhead. This allows fine-tuning of significantly larger models on the same hardware compared to traditional gradient-based methods like Adam. The method is also compatible with parameter-efficient tuning techniques such as LoRA and prefix tuning.
Quick Start & Requirements
Trainer
. Refer to the large_models
folder for implementation details.Trainer
. Specific hardware requirements depend on the model size; a single A100 80GB GPU can train a 30B parameter OPT model.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The repository structure suggests separate implementations for medium and large models, with the latter being clearer and more extensible. The specific license and its implications for commercial use are not detailed in the provided README.
1 year ago
1 week