RAdam by LiyuanLucasLiu

Optimizer for neural network training, addressing adaptive learning rate variance

Created 6 years ago

2,549 stars

Top 18.2% on SourcePulse

View on GitHub

6 Experts Love This Project

Daniel Han

Cofounder of Unsloth

Jerry Tworek

VP Research at OpenAI

Ross Wightman

Author of timm; CV at Hugging Face

Jianwei Yang

Research Scientist at Meta Superintelligence Lab

and 2 more!

Project Summary

RAdam addresses the instability and convergence issues observed in Adam, particularly during the early stages of training deep neural networks. It is designed for researchers and practitioners working with adaptive learning optimizers, offering a theoretically grounded alternative to Adam with improved robustness and potentially better performance.

How It Works

RAdam hypothesizes that the large variance of adaptive learning rates in early training phases is a primary cause of instability. It analytically reduces this variance by introducing a rectification process that considers the variance of the gradients. This approach aims to provide a more stable and reliable learning signal, especially when gradients are sparse or noisy.

Quick Start & Requirements

Install: Typically integrated into deep learning frameworks (e.g., PyTorch, TensorFlow). The README provides links to unofficial PyTorch and Keras implementations.
Prerequisites: Standard deep learning environment (Python, framework libraries). No specific hardware like GPUs is mandated by the optimizer itself, but it's applied within contexts that usually require them.
Setup: Seamless replacement for Adam within existing training loops.

Highlighted Details

Addresses the underlying cause of Adam's need for learning rate warmup.
Demonstrates improved convergence and stability in Transformer models for NMT.
Offers a theoretically sound variant of Adam.
Claims consistent improvements over vanilla Adam across various tasks.

Maintenance & Community

The project is associated with authors from Microsoft and the University of Illinois Urbana-Champaign. While there isn't a dedicated community channel mentioned, unofficial implementations and discussions exist on platforms like GitHub, Medium, and Twitter.

Licensing & Compatibility

The project's code and research are generally available for use, but specific licensing details for the provided implementations would need to be checked on their respective repositories. The research paper is published at ICLR 2020.

Limitations & Caveats

The project is described as an "early-release beta" with potential "rough edges." While RAdam often improves performance out-of-the-box, tuning hyperparameters, including the learning rate, may still be necessary, especially if warmup was already tuned in a baseline Adam setup.

Health Check

Last Commit

4 years ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days