REPA-E by End2End-Diffusion

End-to-end VAE and diffusion model tuning

Created 9 months ago

435 stars

Top 68.4% on SourcePulse

Project Summary

REPA-E enables end-to-end training of Latent Diffusion Models (LDMs) by jointly optimizing the VAE tokenizer and diffusion model, overcoming previous training instabilities. This approach significantly accelerates training and improves generation quality, offering a drop-in replacement VAE (E2E-VAE) that enhances existing LDM architectures. The project targets researchers and practitioners in generative AI seeking faster, more effective LDM training.

How It Works

REPA-E introduces a representation-alignment (REPA) loss to facilitate stable joint training of the VAE and diffusion model. This contrasts with standard diffusion losses, which are ineffective for joint training. The REPA loss aligns the VAE's latent space with the diffusion model's learned representations, enabling efficient end-to-end tuning. This method also improves the VAE itself, creating an "E2E-VAE" that offers better latent structure.

Quick Start & Requirements

Install: Clone the repository and create a conda environment using environment.yml.
Data: Requires ImageNet-1K dataset, preprocessed via preprocessing.py.
Pre-trained Checkpoints: Needs VAE checkpoints (SD-VAE, IN-VAE, VA-VAE) downloaded to a pretrained/ directory.
Training: Launch training with accelerate launch train_repae.py with specified arguments for model, VAE, and encoder type.
Links: Project Page, Models, Paper

Highlighted Details

Achieves state-of-the-art FID scores on ImageNet 256x256 (1.26 with CFG, 1.83 without CFG).
Offers over 17x speedup compared to REPA and 45x over vanilla training.
E2E-VAE serves as a drop-in replacement, improving convergence and generation quality.
Supports various LDM architectures (SiT) and encoder types (e.g., DiT, MoCo, CLIP).

Maintenance & Community

The project is an initial release (April 2025) from the authors of the ICCV 2025 paper. Further community engagement details are not yet specified.

Licensing & Compatibility

The repository does not explicitly state a license. The code builds upon several open-source projects, including 1d-tokenizer, edm2, LightningDiT, REPA, and Taming-Transformers. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is an initial release, and comprehensive documentation beyond the README and setup instructions may be limited. Specific hardware requirements (e.g., GPU, CUDA version) are implied by the use of accelerate and torchrun but not explicitly detailed.

REPA-E by End2End-Diffusion

Explore Similar Projects

SFD by yuemingPAN

Min-SNR-Diffusion-Training by TiankaiHang

vqgan-training by cloneofsimo

BK-SDM by Nota-NetsPresso

TaylorSeer by Shenyi-Z

svdiff-pytorch by mkshing

taesd by madebyollin

LightningDiT by hustvl

consistencydecoder by openai

vdvae by openai

sd3.5 by Stability-AI

pytorch-vqvae by ritheshkumar95