APOLLO by zhuhanqing

Memory-efficient optimizer for LLM training

Created 11 months ago

262 stars

Top 97.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

APOLLO is a memory-efficient optimizer for large language model (LLM) pre-training and fine-tuning, targeting researchers and practitioners facing memory constraints. It achieves SGD-like memory costs while maintaining AdamW-level performance by approximating gradient scaling factors using low-rank auxiliary spaces and random projections, avoiding costly SVD operations.

How It Works

APOLLO leverages two key ideas: structured learning rate updates and optimizer state redundancy reduction. It identifies that channel-wise or tensor-wise gradient scaling is sufficient for LLMs, exploring redundancy in AdamW's element-wise updates. APOLLO approximates these scaling factors in a low-rank auxiliary space via random projections, offering significant memory savings. APOLLO-Mini further reduces memory by using rank-1 tensor-wise scaling, achieving SGD-level costs with superior performance to Adam(W).

Quick Start & Requirements

Install via pip: pip install apollo-torch
Install from source: git clone https://github.com/zhuhanqing/APOLLO.git && cd APOLLO && pip install -e .
Experiment dependencies: pip install -r exp_requirements.txt
Requires PyTorch.
Official documentation and Hugging Face Transformers integration are available.

Highlighted Details

Achieves up to 3x throughput on A100-80GB GPUs by enabling 4x larger batch sizes.
Enables pre-training LLaMA-13B on A100-80G with naive DDP.
Allows LLaMA-7B training from scratch in under 12GB memory when combined with quantization.
Validated by a third-party Julia implementation and integrated into LLaMA-Factory and Hugging Face Transformers.

Maintenance & Community

Active development with recent integrations into major frameworks.
Core contributors' contact information provided for inquiries.
Paper accepted to MLSys'25 with an outstanding paper honorable mention.

Licensing & Compatibility

Majority licensed under CC-BY-NC.
GaLore components are under Apache 2.0 license.
CC-BY-NC may restrict commercial use or linking with closed-source projects.

Limitations & Caveats

The primary license (CC-BY-NC) may impose restrictions on commercial applications. The project acknowledges ongoing work on its to-do list, including FSDP support.

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days