Optimizer for language model pre-training, claiming 2x speedup over Adam
Top 76.2% on sourcepulse
Sophia is a second-order optimizer designed to significantly reduce model training costs and accelerate convergence for large language models. It targets researchers and practitioners aiming to cut computational expenses by offering a faster alternative to Adam, claiming up to 50% reduction in training time and compute.
How It Works
Sophia employs a scalable stochastic second-order optimization approach. It uses an inexpensive stochastic estimate of the Hessian's diagonal as a preconditioner, combined with a clipping mechanism to manage update magnitudes. This method aims to provide superior performance over Adam by achieving similar validation loss with fewer steps, less total compute, and reduced wall-clock time. The optimizer supports both Hutchinson and Gauss-Newton-Bartlett Hessian estimators.
Quick Start & Requirements
pip install Sophia-Optimizer
experiments
folder after cloning the repository.Highlighted Details
rho
value (e.g., 0.03-0.04).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
1 day