Phased-Consistency-Model  by G-U-N

Research paper on improving consistency models for text-to-image generation

Created 1 year ago
495 stars

Top 62.5% on SourcePulse

GitHubView on GitHub
Project Summary

Phased Consistency Models (PCM) enhance consistency models for high-resolution, text-conditioned image generation, addressing limitations of prior methods like LCM. It targets researchers and practitioners in generative AI seeking faster, higher-quality image synthesis from text prompts, offering improved flexibility and consistency over existing techniques.

How It Works

PCM tackles limitations in Consistency Models (CMs) and Latent Consistency Models (LCMs) by phasing the ODE trajectory into multiple sub-trajectories. This approach, focused on distillation, simplifies training compared to methods like CTM while mitigating stochasticity error accumulation. By learning from arbitrary pairs along the ODE trajectory, PCM achieves more stable and higher-quality results, particularly in low-step inference regimes.

Quick Start & Requirements

  • Install: Primarily through Hugging Face diffusers library.
  • Requirements: Python, PyTorch, Hugging Face libraries. GPU with CUDA is recommended for training and efficient inference. Specific CUDA version not stated, but recent versions are generally compatible.
  • Resources: Training scripts for PCM-LoRA with Stable Diffusion XL and v1.5 are provided. Training on 8 A100s is mentioned, but single-GPU results are suggested as viable.
  • Links: Paper, Project Page, Hugging Face Models, Hugging Face Demo.

Highlighted Details

  • Achieves good quality text-conditioned image synthesis in 1, 2, 4, 8, and 16 steps.
  • Outperforms current fast generation models like SDXL-Turbo, SD-Turbo, SDXL-Lightning, InstaFlow, LCM, and SimpleCTM in benchmarks.
  • Demonstrates better generation diversity compared to HyperSD.
  • Offers PCM-LoRA weights for Stable Diffusion v1.5, SDXL, and Stable Diffusion 3.

Maintenance & Community

  • Active development with recent updates in July 2024, including release of training scripts for SDXL and SD3, and bug fixes.
  • Primary contact: Fu-Yun Wang (fywang@link.cuhk.edu.hk).

Licensing & Compatibility

  • The README does not explicitly state a license. However, the project is associated with academic research and Hugging Face, suggesting a permissive or research-oriented license. Further clarification is needed for commercial use.

Limitations & Caveats

  • The README notes that adversarial loss might slightly harm FID scores at NFE >= 4, though it generally yields better visual effects. License status requires verification for commercial applications.
Health Check
Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.4%
4k
Image synthesis research paper using a linear diffusion transformer
Created 11 months ago
Updated 5 days ago
Starred by Benjamin Bolte Benjamin Bolte(Cofounder of K-Scale Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
10 more.

consistency_models by openai

0.1%
6k
PyTorch code for consistency models research paper
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.