fm-boosting by CompVis

Boosting latent diffusion models for high-resolution image synthesis

Created 2 years ago

256 stars

Top 98.5% on SourcePulse

Project Summary

CompVis/fm-boosting introduces FMBoost, a method to significantly accelerate high-resolution image synthesis from latent diffusion models (LDMs). It targets researchers and practitioners in generative AI seeking faster, higher-fidelity image generation, enabling the creation of detailed images at resolutions up to $2048^2$ in under a second.

How It Works

The approach synergizes Diffusion Models (DMs) for diversity, Flow Matching (FMs) for rapid training and inference, and Variational AutoEncoders (VAEs) for efficient latent-to-pixel mapping. It first generates a low-resolution latent using a DM. Then, Coupling Flow Matching (CFM) directly maps this to a higher-resolution latent space. Finally, a pre-trained VAE decoder translates this high-resolution latent into a final pixel-space image. This pipeline achieves both speed and quality by leveraging complementary strengths.

Quick Start & Requirements

Install/Run: Training is initiated via python3 train.py --config <config_file> --name <your-name>. Inference details and pre-trained models are pending release.
Prerequisites: Requires downloading a specific autoencoder checkpoint (sd_ae.ckpt). Training necessitates a configuration file (example provided) and a custom dataset prepared with pre-computed latents (image, latent, latent_lowres). Python 3 is the specified environment.
Links: Project page and ECCV 2024 Oral presentation.

Highlighted Details

Achieves high-fidelity image synthesis at $1024^2$ and $2048^2$ pixels.
Generates high-resolution images in an average of $0.347$ seconds.
Enables cascading generation to increase resolution from $128^2$ px to $2048^2$ px.
Leverages Latent Consistency Models (LCM) distilled from SD1.5 and SDXL.

Maintenance & Community

Developed by the CompVis Group at LMU Munich. No explicit community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README, which may pose a barrier for commercial or closed-source integration.

Limitations & Caveats

Pre-trained checkpoints and inference notebooks are slated for future release, meaning immediate out-of-the-box inference is not yet possible. The README primarily details the training pipeline.

fm-boosting by CompVis

Explore Similar Projects

SFD by yuemingPAN

diffusion-4k by zhang0jhon

Phased-Consistency-Model by G-U-N

piecewise-rectified-flow by magic-research

distrifuser by mit-han-lab

FastGen by NVlabs

hart by mit-han-lab

LightningDiT by hustvl

Real-Time-Latent-Consistency-Model by radames

latent-consistency-model by luosiallen

Sana by NVlabs

latent-diffusion by CompVis