Muon  by KellerJordan

Optimizer for neural network hidden layers

Created 10 months ago
1,720 stars

Top 24.8% on SourcePulse

GitHubView on GitHub
Project Summary

Muon is a PyTorch optimizer specifically designed to accelerate the training of large neural networks by optimizing only the hidden layers. It targets researchers and engineers working with deep learning models, particularly transformers, aiming to reduce training time and computational cost.

How It Works

Muon employs a novel approach by focusing optimization efforts on the higher-dimensional parameters (≥2D) within a network's hidden layers, typically convolutional filters or weight matrices. It leverages momentum and Nesterov acceleration, with default hyperparameters often performing well. This selective optimization strategy is claimed to be more efficient than general-purpose optimizers like AdamW for these specific parameter types.

Quick Start & Requirements

  • Install via pip: pip install git+https://github.com/KellerJordan/Muon
  • Requires PyTorch.
  • Usage involves separating model parameters into those to be optimized by Muon (≥2D) and those by AdamW (<2D, heads, embeddings).
  • Official documentation and examples are available via links in the README.

Highlighted Details

  • Achieved a 94% reduction in CIFAR-10 training time (from 3.3 to 2.6 A100-seconds).
  • Used to train a transformer to GPT-2 (XL) performance for $175 in compute.
  • Improved GPT-2 (small) training speed by 1.35x.
  • Adopted by Kimi.ai for scaled LLM training.

Maintenance & Community

  • The project is associated with Kimi.ai, a frontier AI lab.
  • Further learning resources include blog posts and a technical report.

Licensing & Compatibility

  • The repository does not explicitly state a license. The absence of a license implies all rights are reserved, potentially restricting commercial use or closed-source linking.

Limitations & Caveats

The lack of an explicit open-source license is a significant caveat, potentially limiting its adoption in commercial or closed-source projects. The README does not detail compatibility with other deep learning frameworks or specific hardware requirements beyond standard PyTorch dependencies.

Health Check
Last Commit

2 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
6
Star History
193 stars in the last 30 days

Explore Similar Projects

Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI) and Daniel Han Daniel Han(Cofounder of Unsloth).

cifar10-airbench by KellerJordan

1.0%
295
Fast CIFAR-10 training benchmarks
Created 1 year ago
Updated 2 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
4 more.

Sophia by Liuhong99

0.1%
970
Optimizer for language model pre-training (research paper)
Created 2 years ago
Updated 1 year ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

InternEvo by InternLM

0.2%
407
Lightweight training framework for model pre-training
Created 1 year ago
Updated 4 weeks ago
Starred by Victor Taelin Victor Taelin(Author of Bend, Kind, HVM), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
2 more.

nanoT5 by PiotrNawrot

0.2%
1k
PyTorch code for T5 pre-training and fine-tuning on a single GPU
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.