PyTorch code for scaling Diffusion Transformers with Mixture of Experts
Top 79.7% on sourcepulse
This repository provides the official PyTorch implementation for DiT-MoE, a scaled-up version of Diffusion Transformers utilizing Mixture of Experts (MoE). It addresses the challenge of efficiently scaling large diffusion models, offering competitive performance with dense networks and optimized inference for researchers and practitioners in generative AI.
How It Works
DiT-MoE integrates a Mixture of Experts architecture into the Diffusion Transformer framework. This allows for sparse activation of parameters, enabling models to scale to billions of parameters while maintaining computational efficiency. The implementation supports both standard diffusion training and rectified flow-based training, which is highlighted as leading to better performance and faster convergence.
Quick Start & Requirements
train.py
, train_deepspeed.py
) and sampling script (sample.py
) are provided.torch.distributed
, torch.float16
for inference. Requires ImageNet dataset and a VAE model.Highlighted Details
torch.float16
.Maintenance & Community
The codebase is based on DiT and DeepSeek-MoE. Links to Hugging Face checkpoints are provided.
Licensing & Compatibility
The repository does not explicitly state a license. It is based on other repositories, so their licenses may apply. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README does not specify a license, which may impact commercial use. Detailed setup instructions for the environment are linked to external resources.
10 months ago
1 day