MDT by sail-sg

Image synthesis research paper (ICCV 2023)

Created 2 years ago

593 stars

Top 54.9% on SourcePulse

Project Summary

This repository provides the official implementation for Masked Diffusion Transformer (MDT) and its improved version, MDTv2, which achieves state-of-the-art image synthesis performance. It is designed for researchers and practitioners in computer vision and generative modeling looking to advance image synthesis quality and training efficiency.

How It Works

MDT addresses the limited contextual reasoning in diffusion models by introducing a mask latent modeling scheme. It operates in the latent space, masking certain tokens and using an asymmetric diffusion transformer to predict these masked tokens from unmasked ones. This approach enhances the model's ability to learn relationships among object semantic parts, enabling reconstruction of full images from incomplete contextual inputs. MDTv2 further optimizes this with a more efficient macro network structure and training strategy, leading to faster convergence and stronger performance.

Quick Start & Requirements

Install: pip install -e . and pip install git+https://github.com/sail-sg/Adan.git
Prerequisites: PyTorch >= 2.0, Adan optimizer. Requires ImageNet dataset for training/evaluation.
Pretrained Models: Available on Hugging Face (shgao/MDT-XL2).
Demo: https://huggingface.co/spaces/shgao/MDT

Highlighted Details

Achieves SOTA FID score of 1.58 on ImageNet 256x256 with MDTv2-XL/2.
MDTv2 offers >10x faster learning speed compared to previous SOTA DiT.
MDTv2 demonstrates a 5x acceleration over the original MDT.
Codebase built upon DiT and ADM.

Maintenance & Community

Contributors: Shanghua Gao, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan.
Acknowledgements: DiT and ADM projects.

Licensing & Compatibility

License: Not explicitly stated in the README. The project is presented as an official codebase, implying open-source availability, but specific license terms are absent. Compatibility for commercial use or closed-source linking requires clarification.

Limitations & Caveats

The README does not specify a license, creating uncertainty for commercial use or integration into closed-source projects.
Evaluation setup requires following instructions in the evaluations folder, suggesting a multi-step process beyond the core repository.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days