memory_reduced_optimizer  by adonis-dym

Research paper for memory-reduced deep network training

Created 11 months ago
529 stars

Top 59.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides memory-reduced variants of popular deep learning optimizers (AdamW, Adan, Lion) by reusing gradient space. It targets researchers and practitioners training large models who face memory constraints, offering significant memory savings without compromising training dynamics.

How It Works

The core innovation is gradient space reutilization. When a gradient's historical information is no longer required by the optimizer's update rule, its allocated memory is repurposed to store intermediate variables. This technique is applied to AdamW, Adan, and Lion, creating AdamW-R, Adan-R, and Lion-R, respectively. This approach aims to reduce the optimizer's memory footprint, enabling larger models or batch sizes on limited hardware.

Quick Start & Requirements

  • Install by placing the provided optimizer files directly into your project directory.
  • Requires PyTorch.
  • See paper for detailed experimental results.

Highlighted Details

  • Achieves 6-25% memory savings across various models (ViT, ConvNeXt, BLOOM, LLaMA-2, etc.) compared to standard optimizers.
  • Memory reduction is demonstrated with and without ZeRO optimization.
  • AdamW-R and Adan-R maintain identical training dynamics to their originals.
  • Lion-R has theoretically equivalent dynamics with minimal impact on outcomes.

Maintenance & Community

  • Developed by adonis-dym, with Yiming Dong and Zhouchen Lin as authors.
  • Paper won the PRCV Best Paper Award.

Licensing & Compatibility

  • The repository does not explicitly state a license.

Limitations & Caveats

  • The specific license is not declared, which may impact commercial use or integration into closed-source projects.
  • The README does not detail installation beyond placing files in the project directory, suggesting potential manual integration effort.
Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Elvis Saravia Elvis Saravia(Founder of DAIR.AI), and
2 more.

YaFSDP by yandex

0.1%
975
Sharded data parallelism framework for transformer-like neural networks
Created 1 year ago
Updated 3 months ago
Starred by Ying Sheng Ying Sheng(Coauthor of SGLang) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

llm-analysis by cli99

0.4%
455
CLI tool for LLM latency/memory analysis during training/inference
Created 2 years ago
Updated 5 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
11 more.

Liger-Kernel by linkedin

0.6%
6k
Triton kernels for efficient LLM training
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.