Discover and explore top open-source AI tools and projects—updated daily.
CompVisBoosting latent diffusion models for high-resolution image synthesis
Top 99.1% on SourcePulse
CompVis/fm-boosting introduces FMBoost, a method to significantly accelerate high-resolution image synthesis from latent diffusion models (LDMs). It targets researchers and practitioners in generative AI seeking faster, higher-fidelity image generation, enabling the creation of detailed images at resolutions up to $2048^2$ in under a second.
How It Works
The approach synergizes Diffusion Models (DMs) for diversity, Flow Matching (FMs) for rapid training and inference, and Variational AutoEncoders (VAEs) for efficient latent-to-pixel mapping. It first generates a low-resolution latent using a DM. Then, Coupling Flow Matching (CFM) directly maps this to a higher-resolution latent space. Finally, a pre-trained VAE decoder translates this high-resolution latent into a final pixel-space image. This pipeline achieves both speed and quality by leveraging complementary strengths.
Quick Start & Requirements
python3 train.py --config <config_file> --name <your-name>. Inference details and pre-trained models are pending release.sd_ae.ckpt). Training necessitates a configuration file (example provided) and a custom dataset prepared with pre-computed latents (image, latent, latent_lowres). Python 3 is the specified environment.Highlighted Details
Maintenance & Community
Developed by the CompVis Group at LMU Munich. No explicit community channels (e.g., Discord, Slack) or roadmap links are provided in the README.
Licensing & Compatibility
The repository's license is not specified in the README, which may pose a barrier for commercial or closed-source integration.
Limitations & Caveats
Pre-trained checkpoints and inference notebooks are slated for future release, meaning immediate out-of-the-box inference is not yet possible. The README primarily details the training pipeline.
2 months ago
Inactive
luosiallen
NVlabs
CompVis