Discover and explore top open-source AI tools and projects—updated daily.
Optimizer for language model pre-training, claiming 2x speedup over Adam
Top 74.7% on SourcePulse
Sophia is a second-order optimizer designed to significantly reduce model training costs and accelerate convergence for large language models. It targets researchers and practitioners aiming to cut computational expenses by offering a faster alternative to Adam, claiming up to 50% reduction in training time and compute.
How It Works
Sophia employs a scalable stochastic second-order optimization approach. It uses an inexpensive stochastic estimate of the Hessian's diagonal as a preconditioner, combined with a clipping mechanism to manage update magnitudes. This method aims to provide superior performance over Adam by achieving similar validation loss with fewer steps, less total compute, and reduced wall-clock time. The optimizer supports both Hutchinson and Gauss-Newton-Bartlett Hessian estimators.
Quick Start & Requirements
pip install Sophia-Optimizer
experiments
folder after cloning the repository.Highlighted Details
rho
value (e.g., 0.03-0.04).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
1 day