REPA-E  by End2End-Diffusion

End-to-end VAE and diffusion model tuning

Created 6 months ago
340 stars

Top 81.1% on SourcePulse

GitHubView on GitHub
Project Summary

REPA-E enables end-to-end training of Latent Diffusion Models (LDMs) by jointly optimizing the VAE tokenizer and diffusion model, overcoming previous training instabilities. This approach significantly accelerates training and improves generation quality, offering a drop-in replacement VAE (E2E-VAE) that enhances existing LDM architectures. The project targets researchers and practitioners in generative AI seeking faster, more effective LDM training.

How It Works

REPA-E introduces a representation-alignment (REPA) loss to facilitate stable joint training of the VAE and diffusion model. This contrasts with standard diffusion losses, which are ineffective for joint training. The REPA loss aligns the VAE's latent space with the diffusion model's learned representations, enabling efficient end-to-end tuning. This method also improves the VAE itself, creating an "E2E-VAE" that offers better latent structure.

Quick Start & Requirements

  • Install: Clone the repository and create a conda environment using environment.yml.
  • Data: Requires ImageNet-1K dataset, preprocessed via preprocessing.py.
  • Pre-trained Checkpoints: Needs VAE checkpoints (SD-VAE, IN-VAE, VA-VAE) downloaded to a pretrained/ directory.
  • Training: Launch training with accelerate launch train_repae.py with specified arguments for model, VAE, and encoder type.
  • Links: Project Page, Models, Paper

Highlighted Details

  • Achieves state-of-the-art FID scores on ImageNet 256x256 (1.26 with CFG, 1.83 without CFG).
  • Offers over 17x speedup compared to REPA and 45x over vanilla training.
  • E2E-VAE serves as a drop-in replacement, improving convergence and generation quality.
  • Supports various LDM architectures (SiT) and encoder types (e.g., DiT, MoCo, CLIP).

Maintenance & Community

The project is an initial release (April 2025) from the authors of the ICCV 2025 paper. Further community engagement details are not yet specified.

Licensing & Compatibility

The repository does not explicitly state a license. The code builds upon several open-source projects, including 1d-tokenizer, edm2, LightningDiT, REPA, and Taming-Transformers. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is an initial release, and comprehensive documentation beyond the README and setup instructions may be limited. Specific hardware requirements (e.g., GPU, CUDA version) are implied by the use of accelerate and torchrun but not explicitly detailed.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), and
3 more.

taesd by madebyollin

0.3%
779
Tiny AutoEncoder for Stable Diffusion latents
Created 2 years ago
Updated 5 months ago
Feedback? Help us improve.