CFG-Zero-star  by WeichenFan

Improved CFG for flow matching models

Created 6 months ago
667 stars

Top 50.5% on SourcePulse

GitHubView on GitHub
Project Summary

CFG-Zero* enhances classifier-free guidance for flow matching models, offering improved sample quality and diversity. It is designed for researchers and practitioners working with generative AI, particularly in text-to-image and text-to-video synthesis. The method aims to provide more stable and higher-fidelity generations.

How It Works

CFG-Zero* introduces two key improvements to classifier-free guidance: optimized scaling and zero-initialization. Optimized scaling dynamically adjusts the guidance scale based on the similarity between conditional and unconditional predictions, aiming to prevent over-saturation. Zero-initialization, by contrast, sets the initial prediction to zero for a specified number of steps, which can help models that haven't fully converged.

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment with Python 3.10, and install PyTorch with CUDA 12.4. Then, install dependencies via pip install -r requirements.txt.
  • Prerequisites: PyTorch (v2.5.1 recommended), CUDA 12.4, Python 3.10, and ffmpeg.
  • Local Demo: Run python demo.py.
  • Inference: Example inference scripts are provided for various models like Wan2.1, Flux, Hunyuan, SD3, Qwen2.5-Omni, EasyControl, Cogview4, and HiDream.
  • Resources: Inference examples are shown on an H100 80G GPU.

Highlighted Details

  • Integrates with popular libraries and models including Diffusers, EasyControl, ComfyUI-KJNodes, SD.Next, Wan2.1, Hunyuan, Flux, SD3, Qwen2.5-Omni, Cogview4, and HiDream.
  • Offers both text-to-video and text-to-image generation capabilities.
  • Provides a flexible Python snippet for easy integration into custom flow-matching pipelines.
  • Includes demos for Ghibli-style generation and text-to-image with SD3/SD3.5.

Maintenance & Community

The project is actively updated with new model support and integrations. Community works are highlighted, and links to demos and the project page are available.

Licensing & Compatibility

Licensed under Apache-2.0, allowing for academic research and commercial usage. The project disclaims responsibility for user-generated content and prohibits certain types of content generation.

Limitations & Caveats

The project's disclaimer notes that models are not trained for realistic representation of people or events, and users are solely liable for their actions and content generation. Certain use cases, such as pornographic or violent content, are prohibited.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
16 stars in the last 30 days

Explore Similar Projects

Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI), Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), and
1 more.

Lumina-T2X by Alpha-VLLM

0.0%
2k
Framework for text-to-any modality generation
Created 1 year ago
Updated 7 months ago
Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

IP-Adapter by tencent-ailab

0.3%
6k
Adapter for image prompt in text-to-image diffusion models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.