CFG-Zero-star  by WeichenFan

Improved CFG for flow matching models

created 4 months ago
644 stars

Top 52.7% on sourcepulse

GitHubView on GitHub
Project Summary

CFG-Zero* enhances classifier-free guidance for flow matching models, offering improved sample quality and diversity. It is designed for researchers and practitioners working with generative AI, particularly in text-to-image and text-to-video synthesis. The method aims to provide more stable and higher-fidelity generations.

How It Works

CFG-Zero* introduces two key improvements to classifier-free guidance: optimized scaling and zero-initialization. Optimized scaling dynamically adjusts the guidance scale based on the similarity between conditional and unconditional predictions, aiming to prevent over-saturation. Zero-initialization, by contrast, sets the initial prediction to zero for a specified number of steps, which can help models that haven't fully converged.

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment with Python 3.10, and install PyTorch with CUDA 12.4. Then, install dependencies via pip install -r requirements.txt.
  • Prerequisites: PyTorch (v2.5.1 recommended), CUDA 12.4, Python 3.10, and ffmpeg.
  • Local Demo: Run python demo.py.
  • Inference: Example inference scripts are provided for various models like Wan2.1, Flux, Hunyuan, SD3, Qwen2.5-Omni, EasyControl, Cogview4, and HiDream.
  • Resources: Inference examples are shown on an H100 80G GPU.

Highlighted Details

  • Integrates with popular libraries and models including Diffusers, EasyControl, ComfyUI-KJNodes, SD.Next, Wan2.1, Hunyuan, Flux, SD3, Qwen2.5-Omni, Cogview4, and HiDream.
  • Offers both text-to-video and text-to-image generation capabilities.
  • Provides a flexible Python snippet for easy integration into custom flow-matching pipelines.
  • Includes demos for Ghibli-style generation and text-to-image with SD3/SD3.5.

Maintenance & Community

The project is actively updated with new model support and integrations. Community works are highlighted, and links to demos and the project page are available.

Licensing & Compatibility

Licensed under Apache-2.0, allowing for academic research and commercial usage. The project disclaims responsibility for user-generated content and prohibits certain types of content generation.

Limitations & Caveats

The project's disclaimer notes that models are not trained for realistic representation of people or events, and users are solely liable for their actions and content generation. Certain use cases, such as pornographic or violent content, are prohibited.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
4
Star History
134 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
2 more.

glide-text2im by openai

0.1%
4k
Text-conditional image synthesis model from research paper
created 3 years ago
updated 1 year ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
3 more.

guided-diffusion by openai

0.2%
7k
Image synthesis codebase for diffusion models
created 4 years ago
updated 1 year ago
Feedback? Help us improve.