Muon by KellerJordan

Optimizer for neural network hidden layers

Created 1 year ago

2,182 stars

Top 20.4% on SourcePulse

View on GitHub

4 Experts Love This Project

Benjamin Bolte

Cofounder of K-Scale Labs

Albert Gu

Cofounder of Cartesia; Professor at CMU

Jesse Clark

Cofounder of Marqo

Wing Lian

Founder of Axolotl AI

Project Summary

Muon is a PyTorch optimizer specifically designed to accelerate the training of large neural networks by optimizing only the hidden layers. It targets researchers and engineers working with deep learning models, particularly transformers, aiming to reduce training time and computational cost.

How It Works

Muon employs a novel approach by focusing optimization efforts on the higher-dimensional parameters (≥2D) within a network's hidden layers, typically convolutional filters or weight matrices. It leverages momentum and Nesterov acceleration, with default hyperparameters often performing well. This selective optimization strategy is claimed to be more efficient than general-purpose optimizers like AdamW for these specific parameter types.

Quick Start & Requirements

Install via pip: pip install git+https://github.com/KellerJordan/Muon
Requires PyTorch.
Usage involves separating model parameters into those to be optimized by Muon (≥2D) and those by AdamW (<2D, heads, embeddings).
Official documentation and examples are available via links in the README.

Highlighted Details

Achieved a 94% reduction in CIFAR-10 training time (from 3.3 to 2.6 A100-seconds).
Used to train a transformer to GPT-2 (XL) performance for $175 in compute.
Improved GPT-2 (small) training speed by 1.35x.
Adopted by Kimi.ai for scaled LLM training.

Maintenance & Community

The project is associated with Kimi.ai, a frontier AI lab.
Further learning resources include blog posts and a technical report.

Licensing & Compatibility

The repository does not explicitly state a license. The absence of a license implies all rights are reserved, potentially restricting commercial use or closed-source linking.

Limitations & Caveats

The lack of an explicit open-source license is a significant caveat, potentially limiting its adoption in commercial or closed-source projects. The README does not detail compatibility with other deep learning frameworks or specific hardware requirements beyond standard PyTorch dependencies.

Health Check

Last Commit

1 month ago

Responsiveness

1+ week

Pull Requests (30d)

Issues (30d)

Star History

96 stars in the last 30 days