MARS by AGI-Arena

Optimization framework for training large models

Created 1 year ago

715 stars

Top 48.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Wing Lian

Founder of Axolotl AI

Project Summary

MARS is a unified optimization framework designed to accelerate the training of large deep learning models by combining variance reduction techniques with preconditioned gradient methods. It targets researchers and engineers working with large language models and vision models, offering improved convergence and performance over standard optimizers like AdamW.

How It Works

MARS introduces a "scaled stochastic recursive momentum" to reduce gradient variance and a "preconditioned update" to approximate second-order methods. This dual approach aims to achieve better gradient complexity and per-iteration complexity, leading to faster convergence to critical points. It offers three instantiations: MARS-AdamW, MARS-Lion, and MARS-Shampoo, differing in their Hessian matrix approximations.

Quick Start & Requirements

Install: pip install torch==2.1.2 transformers==4.33.0 datasets tiktoken numpy==1.26.4 wandb
Data Prep: Follow nanoGPT instructions for OpenWebText.
Training: torchrun --standalone --nproc_per_node=8 MARS/train_mars.py config/${your_config_file}
Prerequisites: PyTorch 2.1.2, Transformers 4.33.0, CUDA-enabled GPU (implied by torchrun and A100 examples).
Resources: Training GPT-2 small on A100 requires batch size 15. Reproducing results involves significant compute for large datasets like OpenWebText (50B tokens).
Docs: Configuration files, Scripts

Highlighted Details

Achieves better test loss and accuracy than AdamW and Muon on CIFAR-10/100 with ResNet-18.
GPT-2 XL on FineWeb-Edu reaches 56.52 Hellaswag accuracy with 50B tokens.
Outperforms AdamW and Muon on GPT-2 models across various dataset sizes (5B, 20B, 50B tokens).
Offers both approximate (MARS-approx) and exact gradient calculations, with the former being faster but slightly less performant.

Maintenance & Community

Project is actively updated, with recent additions including vision tasks and reproduction scripts.
Built upon nanoGPT, Levanter, and Sophia.
Paper available on arXiv: https://arxiv.org/abs/2411.10438

Licensing & Compatibility

No explicit license is mentioned in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify a license, which may impact commercial adoption. Hyperparameters require tuning for MARS-Lion and MARS-Shampoo instantiations. The "exact" MARS variant doubles computational cost.

Health Check

Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days