MeZO  by princeton-nlp

Research paper implementation for memory-efficient LM fine-tuning

created 2 years ago
1,118 stars

Top 34.9% on sourcepulse

GitHubView on GitHub
Project Summary

MeZO offers a memory-efficient method for fine-tuning large language models (LLMs) by leveraging zeroth-order optimization, enabling training on hardware typically limited to inference. This approach is beneficial for researchers and practitioners with constrained GPU resources who need to adapt LLMs for specific tasks.

How It Works

MeZO adapts classical zeroth-order stochastic gradient descent (SGD) to operate in-place, eliminating the need for backpropagation and its associated memory overhead. This allows fine-tuning of significantly larger models on the same hardware compared to traditional gradient-based methods like Adam. The method is also compatible with parameter-efficient tuning techniques such as LoRA and prefix tuning.

Quick Start & Requirements

  • Installation: Based on HuggingFace's Trainer. Refer to the large_models folder for implementation details.
  • Prerequisites: Python, HuggingFace Trainer. Specific hardware requirements depend on the model size; a single A100 80GB GPU can train a 30B parameter OPT model.
  • Resources: The primary benefit is reduced GPU memory usage, allowing larger models to be fine-tuned.
  • Links: arXiv

Highlighted Details

  • Achieves comparable performance to Adam fine-tuning on multiple tasks, with up to 12x memory reduction.
  • Can optimize non-differentiable objectives (e.g., accuracy, F1).
  • Compatible with full-parameter and parameter-efficient tuning (LoRA, prefix tuning).
  • Demonstrates superior results over zero-shot and in-context learning.

Maintenance & Community

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The repository structure suggests separate implementations for medium and large models, with the latter being clearer and more extensible. The specific license and its implications for commercial use are not detailed in the provided README.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
18 stars in the last 90 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

Medusa by FasterDecoding

0.2%
3k
Framework for accelerating LLM generation using multiple decoding heads
created 1 year ago
updated 1 year ago
Feedback? Help us improve.