memory_reduced_optimizer  by adonis-dym

Research paper for memory-reduced deep network training

created 9 months ago
533 stars

Top 60.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides memory-reduced variants of popular deep learning optimizers (AdamW, Adan, Lion) by reusing gradient space. It targets researchers and practitioners training large models who face memory constraints, offering significant memory savings without compromising training dynamics.

How It Works

The core innovation is gradient space reutilization. When a gradient's historical information is no longer required by the optimizer's update rule, its allocated memory is repurposed to store intermediate variables. This technique is applied to AdamW, Adan, and Lion, creating AdamW-R, Adan-R, and Lion-R, respectively. This approach aims to reduce the optimizer's memory footprint, enabling larger models or batch sizes on limited hardware.

Quick Start & Requirements

  • Install by placing the provided optimizer files directly into your project directory.
  • Requires PyTorch.
  • See paper for detailed experimental results.

Highlighted Details

  • Achieves 6-25% memory savings across various models (ViT, ConvNeXt, BLOOM, LLaMA-2, etc.) compared to standard optimizers.
  • Memory reduction is demonstrated with and without ZeRO optimization.
  • AdamW-R and Adan-R maintain identical training dynamics to their originals.
  • Lion-R has theoretically equivalent dynamics with minimal impact on outcomes.

Maintenance & Community

  • Developed by adonis-dym, with Yiming Dong and Zhouchen Lin as authors.
  • Paper won the PRCV Best Paper Award.

Licensing & Compatibility

  • The repository does not explicitly state a license.

Limitations & Caveats

  • The specific license is not declared, which may impact commercial use or integration into closed-source projects.
  • The README does not detail installation beyond placing files in the project directory, suggesting potential manual integration effort.
Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.