MDT  by sail-sg

Image synthesis research paper (ICCV 2023)

created 2 years ago
574 stars

Top 57.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for Masked Diffusion Transformer (MDT) and its improved version, MDTv2, which achieves state-of-the-art image synthesis performance. It is designed for researchers and practitioners in computer vision and generative modeling looking to advance image synthesis quality and training efficiency.

How It Works

MDT addresses the limited contextual reasoning in diffusion models by introducing a mask latent modeling scheme. It operates in the latent space, masking certain tokens and using an asymmetric diffusion transformer to predict these masked tokens from unmasked ones. This approach enhances the model's ability to learn relationships among object semantic parts, enabling reconstruction of full images from incomplete contextual inputs. MDTv2 further optimizes this with a more efficient macro network structure and training strategy, leading to faster convergence and stronger performance.

Quick Start & Requirements

  • Install: pip install -e . and pip install git+https://github.com/sail-sg/Adan.git
  • Prerequisites: PyTorch >= 2.0, Adan optimizer. Requires ImageNet dataset for training/evaluation.
  • Pretrained Models: Available on Hugging Face (shgao/MDT-XL2).
  • Demo: https://huggingface.co/spaces/shgao/MDT

Highlighted Details

  • Achieves SOTA FID score of 1.58 on ImageNet 256x256 with MDTv2-XL/2.
  • MDTv2 offers >10x faster learning speed compared to previous SOTA DiT.
  • MDTv2 demonstrates a 5x acceleration over the original MDT.
  • Codebase built upon DiT and ADM.

Maintenance & Community

  • Contributors: Shanghua Gao, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan.
  • Acknowledgements: DiT and ADM projects.

Licensing & Compatibility

  • License: Not explicitly stated in the README. The project is presented as an official codebase, implying open-source availability, but specific license terms are absent. Compatibility for commercial use or closed-source linking requires clarification.

Limitations & Caveats

  • The README does not specify a license, creating uncertainty for commercial use or integration into closed-source projects.
  • Evaluation setup requires following instructions in the evaluations folder, suggesting a multi-step process beyond the core repository.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.