Phased-Consistency-Model  by G-U-N

Research paper on improving consistency models for text-to-image generation

created 1 year ago
487 stars

Top 64.1% on sourcepulse

GitHubView on GitHub
Project Summary

Phased Consistency Models (PCM) enhance consistency models for high-resolution, text-conditioned image generation, addressing limitations of prior methods like LCM. It targets researchers and practitioners in generative AI seeking faster, higher-quality image synthesis from text prompts, offering improved flexibility and consistency over existing techniques.

How It Works

PCM tackles limitations in Consistency Models (CMs) and Latent Consistency Models (LCMs) by phasing the ODE trajectory into multiple sub-trajectories. This approach, focused on distillation, simplifies training compared to methods like CTM while mitigating stochasticity error accumulation. By learning from arbitrary pairs along the ODE trajectory, PCM achieves more stable and higher-quality results, particularly in low-step inference regimes.

Quick Start & Requirements

  • Install: Primarily through Hugging Face diffusers library.
  • Requirements: Python, PyTorch, Hugging Face libraries. GPU with CUDA is recommended for training and efficient inference. Specific CUDA version not stated, but recent versions are generally compatible.
  • Resources: Training scripts for PCM-LoRA with Stable Diffusion XL and v1.5 are provided. Training on 8 A100s is mentioned, but single-GPU results are suggested as viable.
  • Links: Paper, Project Page, Hugging Face Models, Hugging Face Demo.

Highlighted Details

  • Achieves good quality text-conditioned image synthesis in 1, 2, 4, 8, and 16 steps.
  • Outperforms current fast generation models like SDXL-Turbo, SD-Turbo, SDXL-Lightning, InstaFlow, LCM, and SimpleCTM in benchmarks.
  • Demonstrates better generation diversity compared to HyperSD.
  • Offers PCM-LoRA weights for Stable Diffusion v1.5, SDXL, and Stable Diffusion 3.

Maintenance & Community

  • Active development with recent updates in July 2024, including release of training scripts for SDXL and SD3, and bug fixes.
  • Primary contact: Fu-Yun Wang (fywang@link.cuhk.edu.hk).

Licensing & Compatibility

  • The README does not explicitly state a license. However, the project is associated with academic research and Hugging Face, suggesting a permissive or research-oriented license. Further clarification is needed for commercial use.

Limitations & Caveats

  • The README notes that adversarial loss might slightly harm FID scores at NFE >= 4, though it generally yields better visual effects. License status requires verification for commercial applications.
Health Check
Last commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
23 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Travis Fischer Travis Fischer(Founder of Agentic), and
3 more.

consistency_models by openai

0.0%
6k
PyTorch code for consistency models research paper
created 2 years ago
updated 1 year ago
Feedback? Help us improve.