Muon  by KellerJordan

Optimizer for neural network hidden layers

created 8 months ago
1,380 stars

Top 29.9% on sourcepulse

GitHubView on GitHub
Project Summary

Muon is a PyTorch optimizer specifically designed to accelerate the training of large neural networks by optimizing only the hidden layers. It targets researchers and engineers working with deep learning models, particularly transformers, aiming to reduce training time and computational cost.

How It Works

Muon employs a novel approach by focusing optimization efforts on the higher-dimensional parameters (≥2D) within a network's hidden layers, typically convolutional filters or weight matrices. It leverages momentum and Nesterov acceleration, with default hyperparameters often performing well. This selective optimization strategy is claimed to be more efficient than general-purpose optimizers like AdamW for these specific parameter types.

Quick Start & Requirements

  • Install via pip: pip install git+https://github.com/KellerJordan/Muon
  • Requires PyTorch.
  • Usage involves separating model parameters into those to be optimized by Muon (≥2D) and those by AdamW (<2D, heads, embeddings).
  • Official documentation and examples are available via links in the README.

Highlighted Details

  • Achieved a 94% reduction in CIFAR-10 training time (from 3.3 to 2.6 A100-seconds).
  • Used to train a transformer to GPT-2 (XL) performance for $175 in compute.
  • Improved GPT-2 (small) training speed by 1.35x.
  • Adopted by Kimi.ai for scaled LLM training.

Maintenance & Community

  • The project is associated with Kimi.ai, a frontier AI lab.
  • Further learning resources include blog posts and a technical report.

Licensing & Compatibility

  • The repository does not explicitly state a license. The absence of a license implies all rights are reserved, potentially restricting commercial use or closed-source linking.

Limitations & Caveats

The lack of an explicit open-source license is a significant caveat, potentially limiting its adoption in commercial or closed-source projects. The README does not detail compatibility with other deep learning frameworks or specific hardware requirements beyond standard PyTorch dependencies.

Health Check
Last commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)
4
Issues (30d)
4
Star History
810 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

fms-fsdp by foundation-model-stack

0.4%
258
Efficiently train foundation models with PyTorch
created 1 year ago
updated 1 week ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
3 more.

modded-nanogpt by KellerJordan

2.6%
3k
Language model training speedrun on 8x H100 GPUs
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.