DiT-MoE  by feizc

PyTorch code for scaling Diffusion Transformers with Mixture of Experts

Created 1 year ago
382 stars

Top 74.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official PyTorch implementation for DiT-MoE, a scaled-up version of Diffusion Transformers utilizing Mixture of Experts (MoE). It addresses the challenge of efficiently scaling large diffusion models, offering competitive performance with dense networks and optimized inference for researchers and practitioners in generative AI.

How It Works

DiT-MoE integrates a Mixture of Experts architecture into the Diffusion Transformer framework. This allows for sparse activation of parameters, enabling models to scale to billions of parameters while maintaining computational efficiency. The implementation supports both standard diffusion training and rectified flow-based training, which is highlighted as leading to better performance and faster convergence.

Quick Start & Requirements

  • Install/Run: Training scripts (train.py, train_deepspeed.py) and sampling script (sample.py) are provided.
  • Prerequisites: PyTorch, DeepSpeed (for larger models), torch.distributed, torch.float16 for inference. Requires ImageNet dataset and a VAE model.
  • Resources: Multi-GPU setup (NVIDIA GPUs) and potentially multi-node training are recommended for larger models. DeepSpeed with zero2/zero3 configurations are suggested for efficient large-model training.
  • Links: Official PyTorch Implementation, Pre-trained Checkpoints, VAE

Highlighted Details

  • Official PyTorch implementation of DiT-MoE, scaling up to 16 billion parameters.
  • Supports both standard diffusion and rectified flow (RF) training for improved performance.
  • Includes scripts for expert routing analysis and visualization.
  • Optimized inference using torch.float16.

Maintenance & Community

The codebase is based on DiT and DeepSeek-MoE. Links to Hugging Face checkpoints are provided.

Licensing & Compatibility

The repository does not explicitly state a license. It is based on other repositories, so their licenses may apply. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify a license, which may impact commercial use. Detailed setup instructions for the environment are linked to external resources.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
21 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
9 more.

DiT by facebookresearch

0.3%
8k
PyTorch implementation for diffusion models with transformers (DiT)
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.