Discover and explore top open-source AI tools and projects—updated daily.
VAE trainer for latent diffusion models
Top 90.4% on SourcePulse
This repository provides a distributed VAE trainer designed for training Variational Autoencoders (VAEs) used in latent diffusion models like Stable Diffusion. It targets researchers and engineers working with generative AI who need to train high-quality VAEs for image synthesis tasks. The primary benefit is enabling efficient, stable, and high-fidelity VAE training through advanced techniques.
How It Works
The trainer leverages PyTorch's DistributedDataParallel (DDP) for multi-GPU acceleration. It incorporates GAN loss using a VGG16-based discriminator and hinge loss for improved image quality. Perceptual loss is handled by LPIPS. For training stability, it implements gradient normalization and a fixed variance of 0.1, deviating from standard learnable variance. Reconstruction loss is a combination of LPIPS and a modified MSE that operates on a low-pass filtered version of the image to balance detail and color accuracy.
Quick Start & Requirements
torchrun --nproc_per_node=8 vae_trainer.py
img2dataset
), PyTorch.--learning_rate_vae
, --vae_ch
, --vae_ch_mult
, --do_ganloss
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README does not specify the license, which could impact commercial use or integration into closed-source projects. The setup requires a dataset in a specific format (webdataset), necessitating an additional preprocessing step.
11 months ago
Inactive