fm-boosting  by CompVis

Boosting latent diffusion models for high-resolution image synthesis

Created 2 years ago
254 stars

Top 99.1% on SourcePulse

GitHubView on GitHub
Project Summary

CompVis/fm-boosting introduces FMBoost, a method to significantly accelerate high-resolution image synthesis from latent diffusion models (LDMs). It targets researchers and practitioners in generative AI seeking faster, higher-fidelity image generation, enabling the creation of detailed images at resolutions up to $2048^2$ in under a second.

How It Works

The approach synergizes Diffusion Models (DMs) for diversity, Flow Matching (FMs) for rapid training and inference, and Variational AutoEncoders (VAEs) for efficient latent-to-pixel mapping. It first generates a low-resolution latent using a DM. Then, Coupling Flow Matching (CFM) directly maps this to a higher-resolution latent space. Finally, a pre-trained VAE decoder translates this high-resolution latent into a final pixel-space image. This pipeline achieves both speed and quality by leveraging complementary strengths.

Quick Start & Requirements

  • Install/Run: Training is initiated via python3 train.py --config <config_file> --name <your-name>. Inference details and pre-trained models are pending release.
  • Prerequisites: Requires downloading a specific autoencoder checkpoint (sd_ae.ckpt). Training necessitates a configuration file (example provided) and a custom dataset prepared with pre-computed latents (image, latent, latent_lowres). Python 3 is the specified environment.
  • Links: Project page and ECCV 2024 Oral presentation.

Highlighted Details

  • Achieves high-fidelity image synthesis at $1024^2$ and $2048^2$ pixels.
  • Generates high-resolution images in an average of $0.347$ seconds.
  • Enables cascading generation to increase resolution from $128^2$ px to $2048^2$ px.
  • Leverages Latent Consistency Models (LCM) distilled from SD1.5 and SDXL.

Maintenance & Community

Developed by the CompVis Group at LMU Munich. No explicit community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README, which may pose a barrier for commercial or closed-source integration.

Limitations & Caveats

Pre-trained checkpoints and inference notebooks are slated for future release, meaning immediate out-of-the-box inference is not yet possible. The README primarily details the training pipeline.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.5%
5k
Image synthesis research paper using a linear diffusion transformer
Created 1 year ago
Updated 3 weeks ago
Feedback? Help us improve.