Discover and explore top open-source AI tools and projects—updated daily.
End-to-end VAE and diffusion model tuning
Top 81.1% on SourcePulse
REPA-E enables end-to-end training of Latent Diffusion Models (LDMs) by jointly optimizing the VAE tokenizer and diffusion model, overcoming previous training instabilities. This approach significantly accelerates training and improves generation quality, offering a drop-in replacement VAE (E2E-VAE) that enhances existing LDM architectures. The project targets researchers and practitioners in generative AI seeking faster, more effective LDM training.
How It Works
REPA-E introduces a representation-alignment (REPA) loss to facilitate stable joint training of the VAE and diffusion model. This contrasts with standard diffusion losses, which are ineffective for joint training. The REPA loss aligns the VAE's latent space with the diffusion model's learned representations, enabling efficient end-to-end tuning. This method also improves the VAE itself, creating an "E2E-VAE" that offers better latent structure.
Quick Start & Requirements
environment.yml
.preprocessing.py
.pretrained/
directory.accelerate launch train_repae.py
with specified arguments for model, VAE, and encoder type.Highlighted Details
Maintenance & Community
The project is an initial release (April 2025) from the authors of the ICCV 2025 paper. Further community engagement details are not yet specified.
Licensing & Compatibility
The repository does not explicitly state a license. The code builds upon several open-source projects, including 1d-tokenizer
, edm2
, LightningDiT
, REPA
, and Taming-Transformers
. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is an initial release, and comprehensive documentation beyond the README and setup instructions may be limited. Specific hardware requirements (e.g., GPU, CUDA version) are implied by the use of accelerate
and torchrun
but not explicitly detailed.
2 months ago
Inactive