Discover and explore top open-source AI tools and projects—updated daily.
CodeGoat24Stable Text-to-Image Reinforcement Learning with Preference Rewards
Top 97.7% on SourcePulse
Pref-GRPO addresses stable text-to-image generation using pairwise preference rewards. It targets researchers and engineers seeking to improve alignment and generation quality in diffusion models. The project offers a robust framework for integrating advanced reward mechanisms, enhancing model stability and performance.
How It Works
The core innovation is Pairwise Preference Reward-based GRPO (Pref-GRPO), leveraging reinforcement learning with rewards derived from human or AI preferences. It integrates with various UnifiedReward models (alignment, style, coherence, think, flex, edit) to provide nuanced feedback. This approach enhances training stability and model alignment compared to traditional methods, as demonstrated in research on LLM alignment.
Quick Start & Requirements
Setup involves cloning the repository and managing dependencies via Conda. Key steps include:
PrefGRPO, vllm).fastvideo and open_clip (pip install -e .).vllm (>=0.11.0) and qwen-vl-utils (0.0.14) for reward model support.Highlighted Details
Maintenance & Community
The project is actively developed, with recent updates in early 2026 supporting new models and features. Notable contributors include Tongyi Lab and Alibaba Group. Community interaction is primarily through GitHub issues, with direct contact available via Yibin Wang. No dedicated community channels like Discord or Slack are listed.
Licensing & Compatibility
The project's license is not explicitly stated in the provided README. This lack of clear licensing information presents a significant adoption blocker, particularly for commercial use or integration into proprietary systems.
Limitations & Caveats
The setup process is complex, requiring multiple Conda environments, external repository installations (open_clip), and manual download of large model checkpoints. Specific hardware requirements (e.g., GPU memory, CUDA versions) are not detailed. The absence of a clear license is a critical limitation for evaluating adoption.
2 months ago
Inactive
AgentR1
allenai
hkust-nlp