CFG-Zero-star by WeichenFan

Improved CFG for flow matching models

Created 1 year ago

706 stars

Top 48.5% on SourcePulse

Project Summary

CFG-Zero* enhances classifier-free guidance for flow matching models, offering improved sample quality and diversity. It is designed for researchers and practitioners working with generative AI, particularly in text-to-image and text-to-video synthesis. The method aims to provide more stable and higher-fidelity generations.

How It Works

CFG-Zero* introduces two key improvements to classifier-free guidance: optimized scaling and zero-initialization. Optimized scaling dynamically adjusts the guidance scale based on the similarity between conditional and unconditional predictions, aiming to prevent over-saturation. Zero-initialization, by contrast, sets the initial prediction to zero for a specified number of steps, which can help models that haven't fully converged.

Quick Start & Requirements

Installation: Clone the repository, create a conda environment with Python 3.10, and install PyTorch with CUDA 12.4. Then, install dependencies via pip install -r requirements.txt.
Prerequisites: PyTorch (v2.5.1 recommended), CUDA 12.4, Python 3.10, and ffmpeg.
Local Demo: Run python demo.py.
Inference: Example inference scripts are provided for various models like Wan2.1, Flux, Hunyuan, SD3, Qwen2.5-Omni, EasyControl, Cogview4, and HiDream.
Resources: Inference examples are shown on an H100 80G GPU.

Highlighted Details

Integrates with popular libraries and models including Diffusers, EasyControl, ComfyUI-KJNodes, SD.Next, Wan2.1, Hunyuan, Flux, SD3, Qwen2.5-Omni, Cogview4, and HiDream.
Offers both text-to-video and text-to-image generation capabilities.
Provides a flexible Python snippet for easy integration into custom flow-matching pipelines.
Includes demos for Ghibli-style generation and text-to-image with SD3/SD3.5.

Maintenance & Community

The project is actively updated with new model support and integrations. Community works are highlighted, and links to demos and the project page are available.

Licensing & Compatibility

Licensed under Apache-2.0, allowing for academic research and commercial usage. The project disclaims responsibility for user-generated content and prohibits certain types of content generation.

Limitations & Caveats

The project's disclaimer notes that models are not trained for realistic representation of people or events, and users are solely liable for their actions and content generation. Certain use cases, such as pornographic or violent content, are prohibited.

CFG-Zero-star by WeichenFan

Explore Similar Projects

qwen2vl-flux by erwold

diffusion-self-distillation by primecai

MiniGPT-5 by eric-ai-lab

Lumina-mGPT-2.0 by Alpha-VLLM

kandinsky-5 by kandinskylab

stable-diffusion-2-gui by qunash

Lumina-T2X by Alpha-VLLM

Awesome-Video-Diffusion-Models by ChenHsing

Pyramid-Flow by jy0205

Text2Video-Zero by Picsart-AI-Research

IP-Adapter by tencent-ailab

generative-models by Stability-AI