RAdam  by LiyuanLucasLiu

Optimizer for neural network training, addressing adaptive learning rate variance

created 6 years ago
2,553 stars

Top 18.8% on sourcepulse

GitHubView on GitHub
Project Summary

RAdam addresses the instability and convergence issues observed in Adam, particularly during the early stages of training deep neural networks. It is designed for researchers and practitioners working with adaptive learning optimizers, offering a theoretically grounded alternative to Adam with improved robustness and potentially better performance.

How It Works

RAdam hypothesizes that the large variance of adaptive learning rates in early training phases is a primary cause of instability. It analytically reduces this variance by introducing a rectification process that considers the variance of the gradients. This approach aims to provide a more stable and reliable learning signal, especially when gradients are sparse or noisy.

Quick Start & Requirements

  • Install: Typically integrated into deep learning frameworks (e.g., PyTorch, TensorFlow). The README provides links to unofficial PyTorch and Keras implementations.
  • Prerequisites: Standard deep learning environment (Python, framework libraries). No specific hardware like GPUs is mandated by the optimizer itself, but it's applied within contexts that usually require them.
  • Setup: Seamless replacement for Adam within existing training loops.

Highlighted Details

  • Addresses the underlying cause of Adam's need for learning rate warmup.
  • Demonstrates improved convergence and stability in Transformer models for NMT.
  • Offers a theoretically sound variant of Adam.
  • Claims consistent improvements over vanilla Adam across various tasks.

Maintenance & Community

The project is associated with authors from Microsoft and the University of Illinois Urbana-Champaign. While there isn't a dedicated community channel mentioned, unofficial implementations and discussions exist on platforms like GitHub, Medium, and Twitter.

Licensing & Compatibility

The project's code and research are generally available for use, but specific licensing details for the provided implementations would need to be checked on their respective repositories. The research paper is published at ICLR 2020.

Limitations & Caveats

The project is described as an "early-release beta" with potential "rough edges." While RAdam often improves performance out-of-the-box, tuning hyperparameters, including the learning rate, may still be necessary, especially if warmup was already tuned in a baseline Adam setup.

Health Check
Last commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.