LightningDiT by hustvl

Image generation research paper using latent diffusion

Created 1 year ago

1,356 stars

Top 29.5% on SourcePulse

Project Summary

This project addresses the optimization dilemma in latent diffusion models (LDMs) where improving reconstruction quality via larger tokenizers hinders generation performance due to increased computational costs. It offers a solution for researchers and practitioners seeking faster, more efficient training of high-fidelity diffusion models, achieving state-of-the-art results with significantly reduced training times.

How It Works

The core innovation is the Vision foundation model Aligned Variational AutoEncoder (VA-VAE), which aligns the latent space with pre-trained vision foundation models. This approach mitigates the difficulty of learning unconstrained high-dimensional latent spaces, enabling faster convergence for diffusion transformers. The project also introduces LightningDiT, an enhanced diffusion transformer (DiT) baseline built upon VA-VAE, featuring improved training strategies and architectural designs for accelerated training and superior generation quality.

Quick Start & Requirements

Installation: conda create -n lightningdit python=3.10.12, conda activate lightningdit, pip install -r requirements.txt.
Prerequisites: Python 3.10.12, PyTorch.
Training: Requires 8 x H800 GPUs for ~10 hours to achieve FID 2.1 within 64 epochs.
Resources: Pre-trained weights and latent statistics are available for download.
Links: Papers With Code, CVPR 2025 Paper, NeurIPS 2024 Paper.

Highlighted Details

Achieves FID 1.35 on ImageNet-256, surpassing DiT.
Offers over 21x faster convergence compared to original DiT implementations.
Reaches FID 2.11 in approximately 10 hours with 8 GPUs.
VA-VAE selected for Oral Presentation at CVPR 2025.

Maintenance & Community

The project is associated with hustvl and builds upon DiT, FastDiT, and SiT. Code for VA-VAE is based on LDM and MAR.

Licensing & Compatibility

License: MIT.
Compatibility: Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The FID results reported by the inference script are for reference; final FID-50k requires evaluation using OpenAI's guided-diffusion repository.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

45 stars in the last 30 days

Explore Similar Projects

SFD by yuemingPAN

Novel latent diffusion paradigm for accelerated, high-fidelity image generation

Created 2 months ago

Updated 3 weeks ago

fm-boosting by CompVis

Boosting latent diffusion models for high-resolution image synthesis

Created 2 years ago

Updated 2 months ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI).

Kandinsky-3 by ai-forever

Text-to-image diffusion model for multifunctional generative tasks

Created 2 years ago

Updated 11 months ago

MDT by sail-sg

Image synthesis research paper (ICCV 2023)

Created 2 years ago

Updated 1 year ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Christian Laforte

Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), and

3 more.

taesd by madebyollin

Tiny AutoEncoder for Stable Diffusion latents

Created 2 years ago

Updated 2 weeks ago

InstaFlow by gnobitab

One-step image generator using Rectified Flow (ICLR 2024)

Created 2 years ago

Updated 1 year ago

Starred by

Chenlin Meng

Chenlin Meng(Cofounder of Pika) and

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI).

ddrm by bahjat-kawar

Research paper for diffusion-based image restoration

Created 4 years ago

Updated 3 years ago

Starred by

Robin Rombach

Robin Rombach(Cofounder of Black Forest Labs),

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and

2 more.

Kandinsky-2 by ai-forever

Multilingual text-to-image latent diffusion model

Created 3 years ago

Updated 1 year ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Zhiqiang Xie

Zhiqiang Xie(Coauthor of SGLang), and

1 more.

Sana by NVlabs

Image synthesis research paper using a linear diffusion transformer

Created 1 year ago

Updated 3 weeks ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai), and

1 more.

improved-diffusion by openai

Image diffusion codebase for research

Created 4 years ago

Updated 1 year ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen), and

8 more.

guided-diffusion by openai

Image synthesis codebase for diffusion models

Created 4 years ago

Updated 1 year ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs), and

13 more.

latent-diffusion by CompVis

Image synthesis research paper using latent diffusion models

Created 4 years ago

Updated 1 year ago

Feedback? Help us improve.