DiT-MoE by feizc

PyTorch code for scaling Diffusion Transformers with Mixture of Experts

Created 1 year ago

413 stars

Top 70.8% on SourcePulse

Project Summary

This repository provides the official PyTorch implementation for DiT-MoE, a scaled-up version of Diffusion Transformers utilizing Mixture of Experts (MoE). It addresses the challenge of efficiently scaling large diffusion models, offering competitive performance with dense networks and optimized inference for researchers and practitioners in generative AI.

How It Works

DiT-MoE integrates a Mixture of Experts architecture into the Diffusion Transformer framework. This allows for sparse activation of parameters, enabling models to scale to billions of parameters while maintaining computational efficiency. The implementation supports both standard diffusion training and rectified flow-based training, which is highlighted as leading to better performance and faster convergence.

Quick Start & Requirements

Install/Run: Training scripts (train.py, train_deepspeed.py) and sampling script (sample.py) are provided.
Prerequisites: PyTorch, DeepSpeed (for larger models), torch.distributed, torch.float16 for inference. Requires ImageNet dataset and a VAE model.
Resources: Multi-GPU setup (NVIDIA GPUs) and potentially multi-node training are recommended for larger models. DeepSpeed with zero2/zero3 configurations are suggested for efficient large-model training.
Links: Official PyTorch Implementation, Pre-trained Checkpoints, VAE

Highlighted Details

Official PyTorch implementation of DiT-MoE, scaling up to 16 billion parameters.
Supports both standard diffusion and rectified flow (RF) training for improved performance.
Includes scripts for expert routing analysis and visualization.
Optimized inference using torch.float16.

Maintenance & Community

The codebase is based on DiT and DeepSeek-MoE. Links to Hugging Face checkpoints are provided.

Licensing & Compatibility

The repository does not explicitly state a license. It is based on other repositories, so their licenses may apply. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify a license, which may impact commercial use. Detailed setup instructions for the environment are linked to external resources.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days