DiT-MoE  by feizc

PyTorch code for scaling Diffusion Transformers with Mixture of Experts

created 1 year ago
355 stars

Top 79.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official PyTorch implementation for DiT-MoE, a scaled-up version of Diffusion Transformers utilizing Mixture of Experts (MoE). It addresses the challenge of efficiently scaling large diffusion models, offering competitive performance with dense networks and optimized inference for researchers and practitioners in generative AI.

How It Works

DiT-MoE integrates a Mixture of Experts architecture into the Diffusion Transformer framework. This allows for sparse activation of parameters, enabling models to scale to billions of parameters while maintaining computational efficiency. The implementation supports both standard diffusion training and rectified flow-based training, which is highlighted as leading to better performance and faster convergence.

Quick Start & Requirements

  • Install/Run: Training scripts (train.py, train_deepspeed.py) and sampling script (sample.py) are provided.
  • Prerequisites: PyTorch, DeepSpeed (for larger models), torch.distributed, torch.float16 for inference. Requires ImageNet dataset and a VAE model.
  • Resources: Multi-GPU setup (NVIDIA GPUs) and potentially multi-node training are recommended for larger models. DeepSpeed with zero2/zero3 configurations are suggested for efficient large-model training.
  • Links: Official PyTorch Implementation, Pre-trained Checkpoints, VAE

Highlighted Details

  • Official PyTorch implementation of DiT-MoE, scaling up to 16 billion parameters.
  • Supports both standard diffusion and rectified flow (RF) training for improved performance.
  • Includes scripts for expert routing analysis and visualization.
  • Optimized inference using torch.float16.

Maintenance & Community

The codebase is based on DiT and DeepSeek-MoE. Links to Hugging Face checkpoints are provided.

Licensing & Compatibility

The repository does not explicitly state a license. It is based on other repositories, so their licenses may apply. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify a license, which may impact commercial use. Detailed setup instructions for the environment are linked to external resources.

Health Check
Last commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
45 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Travis Fischer Travis Fischer(Founder of Agentic), and
3 more.

consistency_models by openai

0.0%
6k
PyTorch code for consistency models research paper
created 2 years ago
updated 1 year ago
Feedback? Help us improve.