Memory-efficient optimizer for LLM training
Top 99.8% on SourcePulse
APOLLO is a memory-efficient optimizer for large language model (LLM) pre-training and fine-tuning, targeting researchers and practitioners facing memory constraints. It achieves SGD-like memory costs while maintaining AdamW-level performance by approximating gradient scaling factors using low-rank auxiliary spaces and random projections, avoiding costly SVD operations.
How It Works
APOLLO leverages two key ideas: structured learning rate updates and optimizer state redundancy reduction. It identifies that channel-wise or tensor-wise gradient scaling is sufficient for LLMs, exploring redundancy in AdamW's element-wise updates. APOLLO approximates these scaling factors in a low-rank auxiliary space via random projections, offering significant memory savings. APOLLO-Mini further reduces memory by using rank-1 tensor-wise scaling, achieving SGD-level costs with superior performance to Adam(W).
Quick Start & Requirements
pip install apollo-torch
git clone https://github.com/zhuhanqing/APOLLO.git && cd APOLLO && pip install -e .
pip install -r exp_requirements.txt
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The primary license (CC-BY-NC) may impose restrictions on commercial applications. The project acknowledges ongoing work on its to-do list, including FSDP support.
3 months ago
Inactive