Discover and explore top open-source AI tools and projects—updated daily.
hustvlImage generation research paper using latent diffusion
Top 31.4% on SourcePulse
This project addresses the optimization dilemma in latent diffusion models (LDMs) where improving reconstruction quality via larger tokenizers hinders generation performance due to increased computational costs. It offers a solution for researchers and practitioners seeking faster, more efficient training of high-fidelity diffusion models, achieving state-of-the-art results with significantly reduced training times.
How It Works
The core innovation is the Vision foundation model Aligned Variational AutoEncoder (VA-VAE), which aligns the latent space with pre-trained vision foundation models. This approach mitigates the difficulty of learning unconstrained high-dimensional latent spaces, enabling faster convergence for diffusion transformers. The project also introduces LightningDiT, an enhanced diffusion transformer (DiT) baseline built upon VA-VAE, featuring improved training strategies and architectural designs for accelerated training and superior generation quality.
Quick Start & Requirements
conda create -n lightningdit python=3.10.12, conda activate lightningdit, pip install -r requirements.txt.Highlighted Details
Maintenance & Community
The project is associated with hustvl and builds upon DiT, FastDiT, and SiT. Code for VA-VAE is based on LDM and MAR.
Licensing & Compatibility
Limitations & Caveats
The FID results reported by the inference script are for reference; final FID-50k requires evaluation using OpenAI's guided-diffusion repository.
4 months ago
1 day
madebyollin
NVlabs
ai-forever
openai
openai
CompVis