Image synthesis research paper (ICCV 2023)
Top 57.0% on sourcepulse
This repository provides the official implementation for Masked Diffusion Transformer (MDT) and its improved version, MDTv2, which achieves state-of-the-art image synthesis performance. It is designed for researchers and practitioners in computer vision and generative modeling looking to advance image synthesis quality and training efficiency.
How It Works
MDT addresses the limited contextual reasoning in diffusion models by introducing a mask latent modeling scheme. It operates in the latent space, masking certain tokens and using an asymmetric diffusion transformer to predict these masked tokens from unmasked ones. This approach enhances the model's ability to learn relationships among object semantic parts, enabling reconstruction of full images from incomplete contextual inputs. MDTv2 further optimizes this with a more efficient macro network structure and training strategy, leading to faster convergence and stronger performance.
Quick Start & Requirements
pip install -e .
and pip install git+https://github.com/sail-sg/Adan.git
shgao/MDT-XL2
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
evaluations
folder, suggesting a multi-step process beyond the core repository.1 year ago
1 day