Pref-GRPO by CodeGoat24

Stable Text-to-Image Reinforcement Learning with Preference Rewards

Created 10 months ago

272 stars

Top 94.6% on SourcePulse

Project Summary

Pref-GRPO addresses stable text-to-image generation using pairwise preference rewards. It targets researchers and engineers seeking to improve alignment and generation quality in diffusion models. The project offers a robust framework for integrating advanced reward mechanisms, enhancing model stability and performance.

How It Works

The core innovation is Pairwise Preference Reward-based GRPO (Pref-GRPO), leveraging reinforcement learning with rewards derived from human or AI preferences. It integrates with various UnifiedReward models (alignment, style, coherence, think, flex, edit) to provide nuanced feedback. This approach enhances training stability and model alignment compared to traditional methods, as demonstrated in research on LLM alignment.

Quick Start & Requirements

Setup involves cloning the repository and managing dependencies via Conda. Key steps include:

Creating and activating Python 3.12 Conda environments (PrefGRPO, vllm).
Installing fastvideo and open_clip (pip install -e .).
Installing vllm (>=0.11.0) and qwen-vl-utils (0.0.14) for reward model support.
Downloading specific UnifiedReward model checkpoints from Hugging Face.
Training data preprocessing scripts are provided for various models (FLUX, Qwen-Image, Z-Image, etc.).
Official leaderboards are available for evaluation.

Highlighted Details

Extensive model support: Integrates with Z-Image, FLUX (1 & 2), Qwen-Image, Wan (2.1 & 2.2), and others.
UnifiedReward framework: Supports diverse reward signals including aesthetic, CLIP, HPSv2/v3, PickScore, and specialized UnifiedReward variants for alignment, style, coherence, and editing.
Video generation: Extended support for video generation via UnifiedReward-Flex.
LLM Alignment: Proven effectiveness in aligning Large Language Models, as per cited research.

Maintenance & Community

The project is actively developed, with recent updates in early 2026 supporting new models and features. Notable contributors include Tongyi Lab and Alibaba Group. Community interaction is primarily through GitHub issues, with direct contact available via Yibin Wang. No dedicated community channels like Discord or Slack are listed.

Licensing & Compatibility

The project's license is not explicitly stated in the provided README. This lack of clear licensing information presents a significant adoption blocker, particularly for commercial use or integration into proprietary systems.

Limitations & Caveats

The setup process is complex, requiring multiple Conda environments, external repository installations (open_clip), and manual download of large model checkpoints. Specific hardware requirements (e.g., GPU memory, CUDA versions) are not detailed. The absence of a clear license is a critical limitation for evaluating adoption.

Pref-GRPO by CodeGoat24

Explore Similar Projects

RLAIF-V by RLHF-V

VisionThink by JIA-Lab-research

Agentic-RAG-R1 by jiangxinke

verl-omni by verl-project

reward-bench by allenai

Agent-R1 by AgentR1

MOSS-RLHF by OpenLMLab

EasyTransfer by alibaba

RL-Factory by Simple-Efficient

flow_grpo by yifan123

simpleRL-reason by hkust-nlp

VLM-R1 by om-ai-lab