Phased-Consistency-Model by G-U-N

Research paper on improving consistency models for text-to-image generation

Created 1 year ago

509 stars

Top 61.4% on SourcePulse

Project Summary

Phased Consistency Models (PCM) enhance consistency models for high-resolution, text-conditioned image generation, addressing limitations of prior methods like LCM. It targets researchers and practitioners in generative AI seeking faster, higher-quality image synthesis from text prompts, offering improved flexibility and consistency over existing techniques.

How It Works

PCM tackles limitations in Consistency Models (CMs) and Latent Consistency Models (LCMs) by phasing the ODE trajectory into multiple sub-trajectories. This approach, focused on distillation, simplifies training compared to methods like CTM while mitigating stochasticity error accumulation. By learning from arbitrary pairs along the ODE trajectory, PCM achieves more stable and higher-quality results, particularly in low-step inference regimes.

Quick Start & Requirements

Install: Primarily through Hugging Face diffusers library.
Requirements: Python, PyTorch, Hugging Face libraries. GPU with CUDA is recommended for training and efficient inference. Specific CUDA version not stated, but recent versions are generally compatible.
Resources: Training scripts for PCM-LoRA with Stable Diffusion XL and v1.5 are provided. Training on 8 A100s is mentioned, but single-GPU results are suggested as viable.
Links: Paper, Project Page, Hugging Face Models, Hugging Face Demo.

Highlighted Details

Achieves good quality text-conditioned image synthesis in 1, 2, 4, 8, and 16 steps.
Outperforms current fast generation models like SDXL-Turbo, SD-Turbo, SDXL-Lightning, InstaFlow, LCM, and SimpleCTM in benchmarks.
Demonstrates better generation diversity compared to HyperSD.
Offers PCM-LoRA weights for Stable Diffusion v1.5, SDXL, and Stable Diffusion 3.

Maintenance & Community

Active development with recent updates in July 2024, including release of training scripts for SDXL and SD3, and bug fixes.
Primary contact: Fu-Yun Wang (fywang@link.cuhk.edu.hk).

Licensing & Compatibility

The README does not explicitly state a license. However, the project is associated with academic research and Hugging Face, suggesting a permissive or research-oriented license. Further clarification is needed for commercial use.

Limitations & Caveats

The README notes that adversarial loss might slightly harm FID scores at NFE >= 4, though it generally yields better visual effects. License status requires verification for commercial applications.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days