APOLLO  by zhuhanqing

Memory-efficient optimizer for LLM training

created 8 months ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

APOLLO is a memory-efficient optimizer for large language model (LLM) pre-training and fine-tuning, targeting researchers and practitioners facing memory constraints. It achieves SGD-like memory costs while maintaining AdamW-level performance by approximating gradient scaling factors using low-rank auxiliary spaces and random projections, avoiding costly SVD operations.

How It Works

APOLLO leverages two key ideas: structured learning rate updates and optimizer state redundancy reduction. It identifies that channel-wise or tensor-wise gradient scaling is sufficient for LLMs, exploring redundancy in AdamW's element-wise updates. APOLLO approximates these scaling factors in a low-rank auxiliary space via random projections, offering significant memory savings. APOLLO-Mini further reduces memory by using rank-1 tensor-wise scaling, achieving SGD-level costs with superior performance to Adam(W).

Quick Start & Requirements

  • Install via pip: pip install apollo-torch
  • Install from source: git clone https://github.com/zhuhanqing/APOLLO.git && cd APOLLO && pip install -e .
  • Experiment dependencies: pip install -r exp_requirements.txt
  • Requires PyTorch.
  • Official documentation and Hugging Face Transformers integration are available.

Highlighted Details

  • Achieves up to 3x throughput on A100-80GB GPUs by enabling 4x larger batch sizes.
  • Enables pre-training LLaMA-13B on A100-80G with naive DDP.
  • Allows LLaMA-7B training from scratch in under 12GB memory when combined with quantization.
  • Validated by a third-party Julia implementation and integrated into LLaMA-Factory and Hugging Face Transformers.

Maintenance & Community

  • Active development with recent integrations into major frameworks.
  • Core contributors' contact information provided for inquiries.
  • Paper accepted to MLSys'25 with an outstanding paper honorable mention.

Licensing & Compatibility

  • Majority licensed under CC-BY-NC.
  • GaLore components are under Apache 2.0 license.
  • CC-BY-NC may restrict commercial use or linking with closed-source projects.

Limitations & Caveats

The primary license (CC-BY-NC) may impose restrictions on commercial applications. The project acknowledges ongoing work on its to-do list, including FSDP support.

Health Check
Last commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elvis Saravia Elvis Saravia(Founder of DAIR.AI), and
2 more.

dolma by allenai

0.6%
1k
Toolkit for curating datasets for language model pre-training
created 2 years ago
updated 2 days ago
Feedback? Help us improve.