PyTorch implementation of Adan optimizer for faster deep model training
Top 45.1% on sourcepulse
Adan is an adaptive Nesterov momentum algorithm designed to accelerate deep model optimization. It targets researchers and practitioners in deep learning, offering faster convergence and potentially better performance than standard optimizers like AdamW, particularly for large models and datasets.
How It Works
Adan employs a novel adaptive Nesterov momentum approach by incorporating a third moment estimate, $\beta_3$, into the update rule. This allows for more aggressive learning rate scaling and faster convergence. The algorithm is designed to be robust to hyperparameter choices, especially $\beta_2$, and can utilize significantly higher peak learning rates than Adam or AdamW.
Quick Start & Requirements
python3 -m pip install git+https://github.com/sail-sg/Adan.git
cd Adan
, python3 setup.py install --unfused
.Highlighted Details
max_grad_norm
) and offers two weight decay implementations (no_prox
flag).Maintenance & Community
Licensing & Compatibility
LICENSE
file (not detailed in the README) is typical for open-source projects. Further clarification on licensing is recommended for commercial use.Limitations & Caveats
1 month ago
1 day