Research paper for accelerated grokking via gradient amplification
Top 58.3% on sourcepulse
Grokfast accelerates the "grokking" phenomenon in machine learning, where models exhibit delayed generalization after overfitting. This project offers a simple, drop-in solution for practitioners seeking to speed up this process across diverse tasks like image, language, and graph modeling.
How It Works
Grokfast operates by spectrally decomposing parameter gradients into fast and slow-varying components. It then amplifies the slow-varying components, which are hypothesized to drive generalization. This is achieved by integrating custom gradient filtering functions (EMA or MA) directly into the optimization loop, modifying gradients before the optimizer step. This approach aims to hasten the transition from overfitting to generalization without altering the core model architecture or training process.
Quick Start & Requirements
pip install -r requirements.txt
after cloning the repository.requirements.txt
.grokfast.py
and importing its functions.Highlighted Details
gradfilter_ema
(Exponential Moving Average) and gradfilter_ma
(Moving Average).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The provided hyperparameter recommendations are based on experimental experience and may require further tuning for optimal performance on new tasks. The gradfilter_ma
function's additional memory requirements increase linearly with window_size
.
1 year ago
1 day