MeZO by princeton-nlp

Research paper implementation for memory-efficient LM fine-tuning

Created 2 years ago

1,138 stars

Top 33.7% on SourcePulse

View on GitHub

4 Experts Love This Project

Pawel Garbacki

Cofounder of Fireworks AI

Wing Lian

Founder of Axolotl AI

Yaowei Zheng

Author of LLaMA-Factory

Binyuan Hui

Research Scientist at Alibaba Qwen

Project Summary

MeZO offers a memory-efficient method for fine-tuning large language models (LLMs) by leveraging zeroth-order optimization, enabling training on hardware typically limited to inference. This approach is beneficial for researchers and practitioners with constrained GPU resources who need to adapt LLMs for specific tasks.

How It Works

MeZO adapts classical zeroth-order stochastic gradient descent (SGD) to operate in-place, eliminating the need for backpropagation and its associated memory overhead. This allows fine-tuning of significantly larger models on the same hardware compared to traditional gradient-based methods like Adam. The method is also compatible with parameter-efficient tuning techniques such as LoRA and prefix tuning.

Quick Start & Requirements

Installation: Based on HuggingFace's Trainer. Refer to the large_models folder for implementation details.
Prerequisites: Python, HuggingFace Trainer. Specific hardware requirements depend on the model size; a single A100 80GB GPU can train a 30B parameter OPT model.
Resources: The primary benefit is reduced GPU memory usage, allowing larger models to be fine-tuned.
Links: arXiv

Highlighted Details

Achieves comparable performance to Adam fine-tuning on multiple tasks, with up to 12x memory reduction.
Can optimize non-differentiable objectives (e.g., accuracy, F1).
Compatible with full-parameter and parameter-efficient tuning (LoRA, prefix tuning).
Demonstrates superior results over zero-shot and in-context learning.

Maintenance & Community

Contact points for questions and bug reports: Sadhika (smalladi@princeton.edu) and Tianyu (tianyug@princeton.edu). Issues can be opened on the repository.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The repository structure suggests separate implementations for medium and large models, with the latter being clearer and more extensible. The specific license and its implications for commercial use are not detailed in the provided README.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days