Research paper for memory-reduced deep network training
Top 60.2% on sourcepulse
This repository provides memory-reduced variants of popular deep learning optimizers (AdamW, Adan, Lion) by reusing gradient space. It targets researchers and practitioners training large models who face memory constraints, offering significant memory savings without compromising training dynamics.
How It Works
The core innovation is gradient space reutilization. When a gradient's historical information is no longer required by the optimizer's update rule, its allocated memory is repurposed to store intermediate variables. This technique is applied to AdamW, Adan, and Lion, creating AdamW-R, Adan-R, and Lion-R, respectively. This approach aims to reduce the optimizer's memory footprint, enabling larger models or batch sizes on limited hardware.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
6 months ago
1 day