Discover and explore top open-source AI tools and projects—updated daily.
microsoftOrthonormal updates for faster distributed ML training
Top 73.5% on SourcePulse
Dion and Muon are PyTorch optimizers designed to accelerate neural network training by employing orthonormal weight updates, offering faster convergence than traditional methods like Adam/AdamW. They are particularly beneficial for large-scale distributed training scenarios, targeting researchers and engineers working with modern PyTorch and DTensor-based parallelism.
How It Works
Dion utilizes amortized power iteration for orthonormalization, enabling direct application on sharded matrices and supporting low-rank compression via a rank fraction hyperparameter. This approach reduces communication overhead compared to Muon's Newton-Schulz iterations, which require reconstructing full matrices from shards. Dion also incorporates an error feedback mechanism to mitigate information loss from compression.
Quick Start & Requirements
pip install git+https://github.com/microsoft/dion.gitpip install -e .[train]), download the FineWeb dataset, and run training scripts (e.g., torchrun --standalone --nproc_per_node=8 train.py --config configs/dion_160m.yaml).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
replicate_mesh_grad_sync=True leads to decoupled momentum states across data-parallel processes.1 week ago
Inactive
jiaweizzhao
linkedin