Pref-GRPO  by CodeGoat24

Stable Text-to-Image Reinforcement Learning with Preference Rewards

Created 7 months ago
259 stars

Top 97.7% on SourcePulse

GitHubView on GitHub
Project Summary

Pref-GRPO addresses stable text-to-image generation using pairwise preference rewards. It targets researchers and engineers seeking to improve alignment and generation quality in diffusion models. The project offers a robust framework for integrating advanced reward mechanisms, enhancing model stability and performance.

How It Works

The core innovation is Pairwise Preference Reward-based GRPO (Pref-GRPO), leveraging reinforcement learning with rewards derived from human or AI preferences. It integrates with various UnifiedReward models (alignment, style, coherence, think, flex, edit) to provide nuanced feedback. This approach enhances training stability and model alignment compared to traditional methods, as demonstrated in research on LLM alignment.

Quick Start & Requirements

Setup involves cloning the repository and managing dependencies via Conda. Key steps include:

  • Creating and activating Python 3.12 Conda environments (PrefGRPO, vllm).
  • Installing fastvideo and open_clip (pip install -e .).
  • Installing vllm (>=0.11.0) and qwen-vl-utils (0.0.14) for reward model support.
  • Downloading specific UnifiedReward model checkpoints from Hugging Face.
  • Training data preprocessing scripts are provided for various models (FLUX, Qwen-Image, Z-Image, etc.).
  • Official leaderboards are available for evaluation.

Highlighted Details

  • Extensive model support: Integrates with Z-Image, FLUX (1 & 2), Qwen-Image, Wan (2.1 & 2.2), and others.
  • UnifiedReward framework: Supports diverse reward signals including aesthetic, CLIP, HPSv2/v3, PickScore, and specialized UnifiedReward variants for alignment, style, coherence, and editing.
  • Video generation: Extended support for video generation via UnifiedReward-Flex.
  • LLM Alignment: Proven effectiveness in aligning Large Language Models, as per cited research.

Maintenance & Community

The project is actively developed, with recent updates in early 2026 supporting new models and features. Notable contributors include Tongyi Lab and Alibaba Group. Community interaction is primarily through GitHub issues, with direct contact available via Yibin Wang. No dedicated community channels like Discord or Slack are listed.

Licensing & Compatibility

The project's license is not explicitly stated in the provided README. This lack of clear licensing information presents a significant adoption blocker, particularly for commercial use or integration into proprietary systems.

Limitations & Caveats

The setup process is complex, requiring multiple Conda environments, external repository installations (open_clip), and manual download of large model checkpoints. Specific hardware requirements (e.g., GPU memory, CUDA versions) are not detailed. The absence of a clear license is a critical limitation for evaluating adoption.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
2 more.

reward-bench by allenai

0.1%
707
Reward model evaluation tool
Created 2 years ago
Updated 1 month ago
Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
4 more.

simpleRL-reason by hkust-nlp

0.0%
4k
RL recipe for reasoning ability in models
Created 1 year ago
Updated 3 months ago
Feedback? Help us improve.